GPU Usage
Since face recognition requires a lot of processing power, GPU acceleration for Face SDK modules is now available for running deep learning algorithms.
You can use GPU acceleration on:
- Windows x86 64-bit
- Linux x86 64-bit
- Android
- NVIDIA Jetson (JetPack 4.3/4.4)
In this section you'll learn
- which Face SDK modules GPU acceleration is available for
- how to enable GPU acceleration
- timing characteristics for Face SDK modules with CPU and GPU usage
- possible errors during GPU usage, and relevant solutions.
Desktop
Currently, GPU acceleration is available for the following modules (single GPU mode only):
- Recognition methods (11v1000, 10v30, 10v100, 10v1000, 9v30mask, 9v300mask, 9v1000mask) (see Facial Recognition)
- Detectors (BLF, REFA, ULD) (see Face Capturing)
To run models on GPU, edit the appropriate recognizer configuration file: set use_cuda
parameter from 0
to 1
.
Windows/Linux
General requirements:
Software requirements:
Windows:
- Nvidia GPU Driver >= 441.22
- Microsoft Visual C++ Redistributable for Visual Studio 2019
Linux:
- Nvidia GPU Driver >= 440.33
Hardware requirements:
- CUDA compatible GPU (NVIDIA GTX 1050 Ti or better)
For CUDA 10:
Software requirements:
Windows:
Linux:
Hardware requirements:
- CUDA compatible GPU (NVIDIA GTX 1050 Ti or better, but below the 30xx versions)
For CUDA 11:
Software requirements:
Windows:
Linux:
Hardware requirements:
- CUDA compatible GPU (NVIDIA GTX 1050 Ti or better up to and including RTX 3090)
Other:
- Move all the libraries from the /version/bin/cuda11/ directory to the /bin/ folder with file replacement.
You can also use pre-built docker containers with CUDA support, such as nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04 (note that some licenses can be unavailable in this case).
GPU acceleration is performed on one of the available GPUs (by default on GPU with index 0
). GPU index can be changed as follows:
- via the
gpu_index
parameter in the recognizer configuration file - via the
CUDA_VISIBLE_DEVICES
environment variable (see more info about CUDA Environment Variables)
NVIDIA Jetson
- System requirements:
- JetPack 4.3 or 4.4*
* Tests were performed on the Jetson TX2 and Jetson NX modules.
You can select jetson_jetpack_4.3_4.4
component in Face SDK installation wizard. By default, it uses the build for jetpack 4.4. If a build for jetpack 4.3 is required, move all files from the lib/jetpack-4.3 directory to the lib directory.
Test Results
The table below shows the time spent on extraction of one biometric template using CPU and GPU:
Method | GPU | CPU |
12v1000 | 47 ms | 442 ms |
9v300 | 10 ms | 292 ms |
12v100 | 8 ms | 49 ms |
12v50 | 6 ms | 21 ms |
12v30 | 5 ms | 12 ms |
Note: NVIDIA GeForce GTX 1070 and Intel Core i5-9400 4.0GHz were used for the speed test.
Troubleshooting
Error | Solution |
Assertion failed (Cannot open shared object file libtensorflow.so.2) | Make sure the library file libtensorflow.so.2 is in the same directory as the libfacerec.so library you are using |
Assertion failed (Cannot open shared object file tensorflow.dll) | Make sure the library file tensorflow.dll is in the same directory as the facerec.dll library you are using |
Slow initialization | Increasing the default JIT cache size: `export CUDA_CACHE_MAXSIZE=2147483647` (see JIT Caching) |
Android
Currently, GPU acceleration is available for the following modules:
- Recognition methods (9v30, 9v300, 9v1000, 9v30mask, 9v300mask, 9v1000mask) (see Facial Recognition)
- The blf detector (see Face Capturing)
The GPU usage can be enabled/disabled via the use_mobile_gpu
flag in the configuration files of the Capturer
, Recognizer
, VideoWorker
objects (in the configuration file of the VideoWorker
object, GPU is enabled for detectors). By default, mobile GPU support is enabled (the value is 1
). To disable the GPU usage, change the use_mobile_gpu
flag to 0
.
Test Results
The table below shows the time spent on extraction of one biometric template using CPU and GPU:
Method | CPU | GPU |
9v1000 | 3660ms | 610ms |
9v300 | 1960ms | 280ms |
9v30 | 170ms | 70ms |
Note: The speed test was performed using Google Pixel 3.