Skip to main content
Version: 3.22.0 (latest)

Face capturing

Overview

Face capturing includes the following stages: face detection, fitting of face landmarks and calculating of head rotation angles.

Face detection

Face detection in Face SDK is performed using a set of detectors. The detector is a libfacerec library algorithm that applies neural networks to detect faces in images and video streams. The detection result is the coordinates of the bounding box (bbox) around the detected face.

After the face is detected, to optimize further operations, the image is automatically cropped according to the calculated coordinates of the bounding box (bbox). Fitting of anthropometric points and calculating of head rotation angles are performed on the cropped image (face crop).

Detectors

Currently, the following detectors are available:

  • LBF: An outdated detector, not recommended for use;
  • BLF: A detector that provides higher quality and faster detection than LBF for faces of a medium size and larger (including masked faces). On Android you can use GPU acceleration (enabled by default);
  • REFA: A detector that is slower than LBF and BLF detectors. But at the same time it guarantees a better detection quality for faces of various sizes (including masked faces). Recommended for use in expert systems;
  • ULD: A new detector that is faster than REFA. This detector lets you detect faces of various sizes (including masked faces).
  • SSYV: A new detector. In tables below you can see how detectors work in various conditions. The detection thresholds (score_threshold), the minimum size of a detected face (min_size), and other parameters can be configured in the capturer configuration files located in the conf folder of the Face SDK distribution kit (See the Capturer Configuration Parameters section).
Click here to expand the table
BLF (score_threshold=0.6) REFA (min_size=0.2, score_threshold=0.89) ULD (min_size=10, score_threshold=0.7)
Click here to expand the table
ULD (score_threshold=0.4) ULD (score_threshold=0.7) REFA (score_threshold=0.89)
note

Face Detectors can be accelerated by using AVX2 (available on Linux x86 64-bit only). To use the AVX2 instruction set, move the contents of the lib/tensorflow_avx2 directory to the lib directory and define the use_avx2 parameter in the configuration file. You can check the available instructions by executing the grep flags /proc/cpuinfo command.

Fitting of face landmarks

Face landmarks in Face SDK are determined using fitters. The fitter is a special algorithm of the libfacerec library, that positions a set of anthrometric points with 2D/3D coordinates linked to a specific detected face. Several types of fitters that use different sets of face landmarks are described below.

Face landmarks

Note: You can learn how to display face landmarks and head rotation angles in our tutorial.

Four sets of face landmarks: esr, singlelbf, doublelbf, fda, mesh.

  • The esr set is our first set that was the only set available in previous SDK versions. The esr set contains 47 points.
  • The singlelbf and doublelbf provide higher accuracy than esr. The singlelbf set contains 31 points. The doublelbf set contains 101 points. Actually, the doublebf set consists of two concatenated sets – the last 31 points of doublelbf duplicate the singlelbf set (in the same order).
  • The fda provides high accuracy in a wide range of facial angles (up to the full profile), in contrast to the previous sets. So we recommend that you use detectors with this set. However, recognition algorithms still require face samples to be close to frontal. The fda set contains 21 points.
  • At the moment, the mesh set is the newest. It contains 470 3D points of a face. We recommend that you use this set to get a 3D face mesh.
fda set of points. RawSample.getLeftEye returns point 7. RawSample.getRightEye returns point 10esr set of points. RawSample.getLeftEye returns point 16. RawSample.getRightEye returns point 17
singlelbf set of points. RawSample.getLeftEye returns point 29. RawSample.getRightEye returns point 30first 70 points of doubleldb set of points (the rest 31 points are taken from singlelbf). RawSample.getLeftEye returns point 68. RawSample.getRightEye returns point 69
mesh set of points. RawSample.getLeftEye returns point 468.RawSample.getRightEye returns point 469

Iris landmarks

In addition to the standard set of face landmarks, you can get iris landmarks - an extended set of eye points, which includes points of pupils and eyelids. The returned vector contains 40 points for the left and right eyes in the order shown in the image below. For each eye 20 points are returned: the first 5 points refer to the pupil (its center and points on the circle), the remaining 15 points form the contour of the eyelids. A rendering example is available in demo (C++/Java/C#).

Head rotation angles

Calculating head rotation angles relative to the observation axis. The result of this stage is three head rotation angles: pitch, yaw, roll. The accuracy of these angles depends on the set of anthropometric points used.

Face tracking (optionally)

To track faces in video streams, you need to use trackers. The tracker is an algorithm of the libfacerec library that allows tracking face positions from frame to frame. As a result, a person's track is formed (a sequence of video stream frames, which belong to the same person). Each track is assigned a unique identifier (track_id) that doesn't change until the face is lost.

Face capturing

Creating capturer object

To detect faces, first of all, you need to create the Capturer class object. When creating a class object, capturer configuration file should be used. It specifies the type of detector and the type of set of anthropometric points (For more details, see Capturer Configuration files). Also, in the capturer configuration file, you can configure other detection parameters that affect the quality and speed of the entire algorithm. The detector and the set of anthropometric points used are given in the name of the capturer configuration file. For example: common_capturer_blf_fda_front.xml: the blf detector, the fda set of points.

For tracking you can use capturer configuration files of two types:

  • common_video_capturer - Provides higher speed, but lower quality compared to fda_tracker_capturer.
  • fda_tracker_capturer - Provides higher quality, but lower speed compared to common_video_capturer.

All available configuration files are stored in conf folder of Face SDK distribution kit.

Example 1

Create Capturer object using FacerecService.createCapturer method and specify a name of capturer configuration file as an argument.

pbio::Capturer::Ptr capturer = service->createCapturer("common_capturer4.xml");

Example 2

If you need to override some parameter values of capturer configuration file when creating Capturer, follow the steps below:

  1. Create FacerecService.Config object, specify the name of configuration file as an argument.
  2. Override the parameter values using Config.overrideParameter method.
  3. Create Capturer object using FacerecService.createCapturer method, pass FacerecService.Config as an argument.
pbio::FacerecService::Config capturer_config("common_capturer4.xml");
capturer_config.overrideParameter("min_size", 200);
pbio::Capturer::Ptr capturer = service->createCapturer(capturer_config);

Example 3

Some parameter values can be changed in already created Capturer object using Capturer.setParameter method.

pbio::Capturer::Ptr capturer = service->createCapturer("common_capturer4.xml");
capturer->setParameter("min_size", 200);
capturer->setParameter("max_size", 800);
// capturer->capture(...);
// ...
capturer->setParameter("min_size", 100);
capturer->setParameter("max_size", 400);
// capturer->capture(...);

Capturer configuration parameters

Click here to see the list of parameters inside the capturer configuration files that can be changed using the FacerecService.Config.overrideParameter object
  • coarse_score_threshold: Coarse detection confidence threshold. During detection, the detector creates a set of bboxes, each of which has a score value (a number from 0 to 1, indicating the degree of confidence that a face is in the bbox). Bboxes with scores are processed by the nms algorithm, which determines intersections (matches) between bboxes. The coarse_score_threshold parameter allows cutting off bboxes with a low score, which reduces the number of calculations performed by the nms algorithm.
  • score_threshold: Detection confidence threshold.
  • max_processed_width and max_processed_height: (for trackers) The parameter limits the size of the image passed to the internal detector of new faces.
  • min_size and max_size: Minimum and maximum size of a face for detection (for trackers: the size is defined for an image already downscaled according to the restrictions max_processed_width and max_processed_height).
  • min_neighbors: An integer detector parameter, used to reject false detections. You can change this parameter based on the situation. For example, you can increase the value if a large number of false detections is observed, or decrease the value if a large number of faces is not detected. If you aren't sure, we recommend you not change this parameter.
  • use_advanced_multithreading : improves performance when multiple Capturer objects are running in parallel.
  • nms_iou_threshold: Analogue of the min_neighbors parameter, used in most current detectors.
  • min_detection_period: (for tracking) A real number that means the minimum time (in seconds) between two runs of the internal detector. A zero value means ‘no restrictions’. The parameter is used to reduce the processor load. Large values increase the latency in detection of new faces.
  • max_detection_period: (for tracking) An integer that means the max time (in frames) between two runs of the internal detector. A zero value means ‘no restrictions’. For example, if you process a video offline, you can set the value to 1 so as not to miss a single person.
  • max_occlusion_time_wait: (for tracking) A real number in seconds. When face occlusion is detected, the tracker holds the face position and tries to track it on new frames during this time.
  • fda_max_bad_count_wait: An integer. When fda_tracker detects the decline in the face quality, the algorithm tries to track this face with the general purpose tracker (instead of the fda method designed and tuned for faces) during at most fda_max_bad_count_wait frames.
  • base_angle: An integer: 0, 1, 2, or 3. Camera orientation angles: 0 means standard (default), 1 means +90 degrees, 2 means -90 degrees, 3 means 180 degrees. When you change camera orientation, you need to set the new orientation value for this parameter, otherwise the detection quality will decrease.
  • fake_detections_cnt: An integer. Number of start positions to search a face using video_worker_fdatracker_fake_detector.xml. The start position is the fixed position of the face in the image. We can set the coordinates of start position if we are sure that there is a face in the given area of the image. The image with the marked start position goes to fake detector, that sends the image directly to the fitter. It is assumed that the image already has a face, which means that you can immediately proceed to the fitting of anthropometric points.
  • fake_detections_period: An integer. Each start position will be used once in fake_detections_period frames.
  • fake_rect_center_xN, fake_rect_center_yN, fake_rect_angleN, fake_rect_sizeN: Real numbers for parameters of start positions. N is from 0 to fake_detections_cnt – 1 inclusive. fake_rect_center_xN – x coordinate of a center relative to the image width. fake_rect_center_yN – y coordinate of a center relative to the image height. fake_rect_angleN – roll angle in degrees. fake_rect_sizeN – size relative to max(image width, image height).
  • downscale_rawsamples_to_preferred_size: An integer, 1 means enabled, 0 means disabled. Default value is enabled. When enabled, Capturer downscales each sample to the suitable size ( RawSample.downscaleToPreferredSize) to reduce memory consumption. However, it decreases the system performance. We recommend that you disable downscale_rawsamples_to_preferred_size and use RawSample.downscaleToPreferredSize manually for RawSamples that you need to save or keep in RAM for a long time.
  • iris_enabled: get an extended set of eye points. 1 means enabled (the returned vector contains eye points), 0 means disabled (the returned vector is empty).

Starting capturing

You can pass an image to the detector in two ways:

  • Pass the data of the encoded image in JPG, PNG, TIF or BPM format to the method Capturer.capture
  • Pass the data of the decoded image to the method Capturer.capture, using the RawImage class

The captured face is stored in RawSample object.

// read an image from a file
cv::Mat image;
image = cv::imread(image_path, cv::IMREAD_COLOR);

// create RawImage object
cv::Mat input_image;
cv::cvtColor(image, input_image, cv::COLOR_BGR2RGB);
pbio::RawImage input_rawimg(input_image.cols, input_image.rows, pbio::RawImage::Format::FORMAT_RGB, input_image.data);

// run detection
std::vector<pbio::RawSample::Ptr> samples = capturer->capture(input_rawimg);

For detection combined with tracking you can also call the Capturer.resetHistory method to delete all frames and faces from the history and start tracking on a new video sequence.

Capturing result

RawSample is an interface object that stores the capturing result. The following operations can be done using RawSample methods:

  • Get a sample id (RawSample.getID) (only for detection combined with tracking);
  • Get the detection confidence score (for BLF, REFA, ULD detectors). To do this, call the RawSample.getScore() method. As a result, you'll receive a float number in the range of [0, 1];
  • Get a face rectangle (RawSample.getRectangle), angles (RawSample.getAngles), left / right eye (RawSample.getLeftEye / RawSample.getRightEye), face landmarks (RawSample.getLandmarks), only if the face is frontal;
  • Get an extended set of eye points, which includes points of pupils and eyelids (RawSample.getIrisLandmarks());
  • Downscale an internal face image to suitable size (RawSample.downscaleToPreferredSize);
  • Serialize an object in a binary stream (RawSample.save or RawSample.saveWithoutImage). You can deserialize it later using FacerecService.loadRawSample or FacerecService.loadRawSampleWithoutImage;
  • Normalize a face image with subsequent cropping (see Face Normalization).

RawSample object can also be passed to the methods of the age, gender, quality and liveness estimation (see Face Estimation, test_facecut, test_videocap), or to Recognizer.processing for template creating (see Facial Recognition, test_identify).

Face normalization

Face normalization refers to the rotation of a non-frontal face to a frontal position. It is needed for better handling of face recognition and other operations with detected faces. A face can be normalized by one of the following RawSample methods:

  • RawSample.cutFaceImage: the normalized face is saved to the specified stream (for example, to a file), the encoding format is selected via RawSample.ImageFormat
  • RawSample.cutFaceRawImage: the normalized face is returned in the RawImage format (it stores the non-coded image pixels in the RGB/BGR/GRAY format (the format is selected via RawImage.Format)

Examples of using RawSample.cutFaceRawImage:

auto raw_image_crop = sample->cutFaceRawImage(
pbio::RawImage::Format::FORMAT_BGR,
pbio::RawSample::FACE_CUT_FULL_FRONTAL);
cv::Mat img_crop(raw_image_crop.height, raw_image_crop.width, CV_8UC3, (void*) raw_image_crop.data);

Available face normalization types (RawSample.FaceCutType):

  • FACE_CUT_BASE: basic normalization (any sample type);
  • FACE_CUT_FULL_FRONTAL: ISO/IEC 19794-5 Full Frontal (only frontal sample type). It is used for saving face images in electronic biometric documents;
  • FACE_CUT_TOKEN_FRONTAL: ISO/IEC 19794-5 Token Frontal (only frontal sample type).

To preview the normalized face, call the RawSample.getFaceCutRectangle method by specifying the normalization type. As a result, you'll get four points – the corners of the rectangle that will be used for cropping.

note

See the examples of face capturing and normalization for different programming languages in Samples section.

Capturer configs for specific business cases

There are several xml-configs, which are fine-tuned for some business cases. Case and operation quality level are specified in the file name. For example, access_control_system_several_faces_q1.xml.

At the moment, there are configs for:

  • Access Control System (ACS) for several faces in the frame:

    • access_control_system_several_faces_q1.xml
    • access_control_system_several_faces_q2.xml
  • ACS for one face in the frame:

    • access_control_system_one_face_q1.xml
    • access_control_system_one_face_q2.xml
    • access_control_system_one_face_q3.xml
  • Safe City (indoor/outdoor video surveillance):

    • safety_city_q1.xml
    • safety_city_q2.xml
  • Remote Identification (mobile/desktop):

    • remote_identification_q1.xml
    • remote_identification_q2.xml