GPU usage
Since face recognition requires a lot of processing power, GPU acceleration for Face SDK modules is now available for running deep learning algorithms.
In this section you'll learn
- which Face SDK modules GPU acceleration is available for
- how to enable GPU acceleration
- timing characteristics for Face SDK modules with CPU and GPU usage
- possible errors during GPU usage, and relevant solutions.
Desktop
System requirements for GPU usage
Currently, GPU acceleration is available for the following modules (single GPU mode only):
- Recognition methods (12v30, 12v50, 12v100, 12v1000, 11v1000, 10v30, 10v100, 10v1000, 9v30mask, 9v300mask, 9v1000mask) (see Facial Recognition)
- Detectors (BLF, REFA, ULD) (see Face Capturing)
To run models on GPU, edit the appropriate recognizer configuration file: set use_cuda
parameter from 0
to 1
.
To run processing on cuda 10.1, edit the object configuration file by adding the use_legacy
field with a value of 1
GPU acceleration is performed on one of the available GPUs (by default on GPU with index 0
). GPU index can be changed as follows:
- via the
gpu_index
parameter in the recognizer configuration file - via the
CUDA_VISIBLE_DEVICES
environment variable (see more info about CUDA Environment Variables)
Test results
The table below shows the time spent on extraction of one biometric template using CPU and GPU:
Method | GPU | CPU |
12v1000 | 47 ms | 442 ms |
9v300 | 10 ms | 292 ms |
12v100 | 8 ms | 49 ms |
12v50 | 6 ms | 21 ms |
12v30 | 5 ms | 12 ms |
Note: NVIDIA GeForce GTX 1070 and Intel Core i5-9400 4.0GHz were used for the speed test.
Troubleshooting
Error | Solution |
Assertion failed (Cannot open shared object file libtensorflow.so.2) | Make sure the library file libtensorflow.so.2 is in the same directory as the libfacerec.so library you are using |
Assertion failed (Cannot open shared object file tensorflow.dll) | Make sure the library file tensorflow.dll is in the same directory as the facerec.dll library you are using |
Slow initialization | Increasing the default JIT cache size: `export CUDA_CACHE_MAXSIZE=2147483647` (see JIT Caching) |
Android
Currently, GPU acceleration is available for the following modules:
- Recognition methods (9v30, 9v300, 9v1000, 9v30mask, 9v300mask, 9v1000mask) (see Facial Recognition)
- The blf detector (see Face Capturing)
The GPU usage can be enabled/disabled via the use_mobile_gpu
flag in the configuration files of the Capturer
, Recognizer
, VideoWorker
objects (in the configuration file of the VideoWorker
object, GPU is enabled for detectors). By default, mobile GPU support is enabled (the value is 1
). To disable the GPU usage, change the use_mobile_gpu
flag to 0
.
Test results
The table below shows the time spent on extraction of one biometric template using CPU and GPU:
Method | CPU | GPU |
9v1000 | 3660ms | 610ms |
9v300 | 1960ms | 280ms |
9v30 | 170ms | 70ms |
Note: The speed test was performed using Google Pixel 3.
GPU acceleration doesn't work on some Android devices :::