Version: 3.19.1

GPU Usage

Since face recognition requires a lot of processing power, GPU acceleration for Face SDK modules is now available for running deep learning algorithms.

You can use GPU acceleration on:

Windows x86 64-bit
Linux x86 64-bit
Android
NVIDIA Jetson (JetPack 4.3/4.4)

In this section you'll learn

which Face SDK modules GPU acceleration is available for
how to enable GPU acceleration
timing characteristics for Face SDK modules with CPU and GPU usage
possible errors during GPU usage, and relevant solutions.

Desktop

Currently, GPU acceleration is available for the following modules (single GPU mode only):

Recognition methods (11v1000, 10v30, 10v100, 10v1000, 9v30mask, 9v300mask, 9v1000mask) (see Facial Recognition)
Detectors (BLF, REFA, ULD) (see Face Capturing)
Most of Processing Blocks

To run models on GPU, edit the appropriate recognizer configuration file: set use_cuda parameter from 0 to 1.

Windows/Linux

General requirements:
- Software requirements:
  Windows:
  - Nvidia GPU Driver >= 441.22
  - Microsoft Visual C++ Redistributable for Visual Studio 2019
  Linux:
  - Nvidia GPU Driver >= 440.33
- Hardware requirements:
  - CUDA compatible GPU (NVIDIA GTX 1050 Ti or better)
For CUDA 11:
- Software requirements:
  Windows:
  - CUDA Toolkit 11.0 ≤ version < 12
  - cuDNN v8.8.0 - v9.0.0 for CUDA 11.x
  Linux:
  - CUDA Toolkit 11.0 ≤ version < 12
  - cuDNN v8.8.0 for CUDA 11.x
- Hardware requirements:
  - CUDA compatible GPU (NVIDIA GTX 1050 Ti or better up to and including RTX 3090)
For CUDA 10:
- Software requirements:
  Windows:
  - CUDA Toolkit 10.1
  - cuDNN v7.6.5 for CUDA 10.1
  Linux:
  - CUDA Toolkit 10.1
  - cuDNN v7.6.5 for CUDA 10.1
- Hardware requirements:
  - CUDA compatible GPU (NVIDIA GTX 1050 Ti or better, but below the 30xx versions)
- Other requirements:
  - For old API:edit xml-config and add field <use_legacy>1</use_legacy>
  - For Processing Block API: add "use_legacy" key with true value to Context for Processing Block creation.

You can also use pre-built docker containers with CUDA support, such as nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04 (note that some licenses can be unavailable in this case).

GPU acceleration is performed on one of the available GPUs (by default on GPU with index 0). GPU index can be changed as follows:

via the gpu_index parameter in the recognizer configuration file
via the CUDA_VISIBLE_DEVICES environment variable (see more info about CUDA Environment Variables)

NVIDIA Jetson

System requirements:
- JetPack 4.3 or 4.4*

* Tests were performed on the Jetson TX2 and Jetson NX modules.

You can select jetson_jetpack_4.3_4.4 component in Face SDK installation wizard. By default, it uses the build for jetpack 4.4. If a build for jetpack 4.3 is required, move all files from the lib/jetpack-4.3 directory to the lib directory.

Test Results

The table below shows the time spent on extraction of one biometric template using CPU and GPU:

Method	GPU	CPU
12v1000	47 ms	442 ms
9v300	10 ms	292 ms
12v100	8 ms	49 ms
12v50	6 ms	21 ms
12v30	5 ms	12 ms

Note: NVIDIA GeForce GTX 1070 and Intel Core i5-9400 4.0GHz were used for the speed test.

Troubleshooting

Error	Solution
Assertion failed (Cannot open shared object file libtensorflow.so.2)	Make sure the library file libtensorflow.so.2 is in the same directory as the libfacerec.so library you are using
Assertion failed (Cannot open shared object file tensorflow.dll)	Make sure the library file tensorflow.dll is in the same directory as the facerec.dll library you are using
Slow initialization	Increasing the default JIT cache size: `export CUDA_CACHE_MAXSIZE=2147483647` (see JIT Caching)

Android

Currently, GPU acceleration is available for the following modules:

Recognition methods (9v30, 9v300, 9v1000, 9v30mask, 9v300mask, 9v1000mask) (see Facial Recognition)
The blf detector (see Face Capturing)

The GPU usage can be enabled/disabled via the use_mobile_gpu flag in the configuration files of the Capturer, Recognizer, VideoWorker objects (in the configuration file of the VideoWorker object, GPU is enabled for detectors). By default, mobile GPU support is enabled (the value is 1). To disable the GPU usage, change the use_mobile_gpu flag to 0.

Test Results

The table below shows the time spent on extraction of one biometric template using CPU and GPU:

Method	CPU	GPU
9v1000	3660ms	610ms
9v300	1960ms	280ms
9v30	170ms	70ms

Note: The speed test was performed using Google Pixel 3.

Desktop​

Windows/Linux​

NVIDIA Jetson​

Test Results​

Troubleshooting​

Android​

Test Results​

Desktop

Windows/Linux

NVIDIA Jetson

Test Results

Troubleshooting

Android

Test Results