GPU Usage
Since face recognition requires a lot of processing power, GPU acceleration for OMNI modules is now available for running deep learning algorithms.
You can use GPU acceleration on:
- Windows x86 64-bit
- Linux x86 64-bit
- Android
- Jetson (JetPack 4.3/4.4)
In this section you'll find the information about GPU acceleration for available OMNI modules and learn how to enable this function, as well as the information about timing characteristics for OMNI modules with CPU and GPU usage, possible errors during GPU usage, and relevant solutions.
Desktop [beta]
Currently, GPU acceleration is available for the following modules (single GPU mode only):
- Recognizers (11v1000, 10v30, 10v100, 10v1000, 9v30mask, 9v300mask, 9v1000mask) (see Face Identification)
- Detectors (BLF, REFA, ULD) (see Face Capturing)
To run models on GPU, edit the configuration file of one of the supported recognizers: set use_cuda
to 1
.
Windows/Linux
- Software requirements:
- Nvidia GPU Driver >= 410.48
- CUDA Toolkit 10.1
- cuDNN 7
- (For Windows) Microsoft Visual C++ Redistributable for Visual Studio 2019
- Hardware requirements:
- CUDA compatible GPU (NVIDIA GTX 1050 Ti or better)
You can also use pre-built docker containers with CUDA support, such as nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04 (note that some licenses can be unavailable in this case).
GPU acceleration is performed on one of the available GPUs (by default on GPU with index 0
). GPU index can be changed as follows:
- via the
gpu_index
parameter in the configuration file - via the
CUDA_VISIBLE_DEVICES
environment variable (see more info about CUDA Environment Variables)
Jetson
- System requirements:
- JetPack 4.3 or 4.4*
* Tests were performed on the Jetson TX2 and Jetson NX modules.
The archive with the required libraries is jetson_jetpack_4.3_4.4.tar.xz
. By default, it uses the build for jetpack 4.4. If a build for jetpack 4.3 is required, move all files from the lib/jetpack-4.3 directory to the lib directory.
Timing characteristics
The table below shows the speed measurements for template creation using CPU and GPU:
Method | GPU | CPU |
11v1000 | 35 ms | 865 ms |
9v300 | 10 ms | 260 ms |
10v100 | 13 ms | 40 ms |
10v30 | 11 ms | 24 ms |
See the timing characteristics of GPU-based face detection in the Face detection section.
Note: the NVIDIA GeForce GTX 1080 Ti and Intel Core i7 were used for the speed test.
Troubleshooting
Error | Solution |
Assertion failed (Cannot open shared object file libtensorflow.so.2) | Make sure the library file libtensorflow.so.2 is in the same directory as the libfacerec.so library you are using |
Assertion failed (Cannot open shared object file tensorflow.dll) | Make sure the library file tensorflow.dll is in the same directory as the facerec.dll library you are using |
Slow initialization | Increasing the default JIT cache size: `export CUDA_CACHE_MAXSIZE=2147483647` (see JIT Caching) |
Android [beta]
Currently, GPU acceleration is available for the following modules:
- Recognizers (9v30, 9v300, 9v1000, 9v30mask, 9v300mask, 9v1000mask) (see Face Identification)
- The blf detector (see Face Capturing)
The GPU usage can be enabled/disabled via the use_mobile_gpu
flag in the configuration files of the Capturer
, Recognizer
, VideoWorker
objects (in the configuration file of the VideoWorker
object, GPU is enabled for detectors). By default, mobile GPU support is enabled (the value is 1
). To disable the GPU usage, change the use_mobile_gpu
flag to 0
.
Timing characteristics
The table below shows the speed measurements for OMNI modules using CPU and GPU:
Method | CPU | GPU |
9v1000 | 3660ms | 610ms |
9v300 | 1960ms | 280ms |
9v30 | 170ms | 70ms |
Note: The speed test was performed using Google Pixel 3.