Skip to main content
Version: 3.22.0 (latest)

Video stream processing

Overview

To process video streams, Face SDK uses VideoWorker interface object, responsible for thread control and synchronization routine. You only need to provide decoded video frames and register a few callback functions.

VideoWorker object can be used to:

See an example of using VideoWorker in video_recognition_demo.

Related tutorials:

Create VideoWorker object

To create VideoWorker, use FacerecService.createVideoWorker method.

Examples

pbio::FacerecService::Config video_worker_config("video_worker_lbf.xml");
video_worker_config.overrideParameter("search_k", 3);
pbio::VideoWorker::Ptr video_worker = service->createVideoWorker(
pbio::VideoWorker::Params()
.video_worker_config(video_worker_config)
.recognizer_ini_file(recognizer_config)
.streams_count(streams_count)
.processing_threads_count(processing_threads_count)
.matching_threads_count(matching_threads_count)
.age_gender_estimation_threads_count(age_gender_estimation_threads_count)
.emotions_estimation_threads_count(emotions_estimation_threads_count)
.short_time_identification_enabled(enable_sti)
.short_time_identification_distance_threshold(sti_recognition_threshold)
.short_time_identification_outdate_time_seconds(sti_outdate_time)
);

Where:

  • video_worker_config: path to the configuration file for VideoWorker or FacerecService.Config object.
  • video_worker_params: parameters of VideoWorker constructor.
  • recognizer_config: the configuration file of the method used (see Facial Recognition).
  • streams_count: the number of video streams.
  • processing_threads_count: the number of threads for template extraction. These threads are common for all video streams and distribute resources evenly across all video streams regardless of their workload (except for the video streams without faces in the frame).
  • matching_threads_count: the number of threads for comparing templates with the database. Like processing threads, these threads distribute the workload evenly across all video streams.
  • age_gender_estimation_threads_count: the number of threads for age and gender estimation. Like processing threads, these threads distribute the workload evenly across all video streams.
  • emotions_estimation_threads_count: the number of threads for emotion estimation. Like processing threads, these threads distribute the workload evenly across all video streams.
  • enable_sti: the flag enables Short-Time Identification.
  • sti_recognition_threshold: the distance threshold for Short-Time Identification.
  • sti_outdate_time: time period in seconds for Short-Time Identification.

All available configuration files are stored in the conf folder of the Face SDK distribution.

Provide video frames

To provide video frames, call VideoWorker.addVideoFrame. This method is thread-safe, so you can provide frames from different threads without additional synchronization.

The method arguments are frame (frame color image), stream_id (video stream identifier) and timestamp_microsec (frame timestamp in microseconds). The method returns frame_id (integer frame identifier) which will be used in callbacks to identify this frame.

Callbacks

To return frame processing results, VideoWorker uses a set of callbacks. Processing data received from different threads is organized into a data structure, passed as a callback argument. Attributes of data structure are different for each type of callback.

The following types of callbacks are available:

  • TrackingCallbackU (Face tracking, estimation of gender, age and emotions, liveness estimation)
  • TrackingLostCallbackU (Face tracking)
  • TemplateCreatedCallbackU (Creating templates)
  • MatchFoundCallbackU (Face identification)
  • StiPersonOutdatedCallbackU (Short-time identification)

To add a callback, use the VideoWorker.add(callback name) method. The method returns callback_id (integer identifier of the callback). To remove a callback, use the VideoWorker.remove(callback name) method with callback_id as an argument.

Face tracking

note

If VideoWorker is used only for face tracking, specify matching_thread=0 and processing_thread=0, and apply the standard Face Detector license. To create VideoWorker for one stream, set the streams_count=1 parameter.

You can use two callbacks for face tracking:

VideoWorker.TrackingCallbackU (Tracking callback)

While a person is in the camera's field of view, a person's track is formed (a sequence of video stream frames, which belongs to the same person). Each track is assigned its own integer identifier (track_id).

Video stream frames are passed to the tracking conveyor - a structure responsible for synchronizing and distributing frames among processing threads. After processing each frame, the Tracking callback is called, which returns the tracking results. Tracking results are passed in TrackingCallbackData structure (C++, Java, C#, Python).

Tracking callbacks with the same stream_id are called in ascending frame_id order. Therefore, if a callback with stream_id = 1 and frame_id = 102 was received immediately after a callback with stream_id = 1 and frame_id = 100, it means the frame with frame_id = 101 was skipped for the video stream 1.

VideoWorker.TrackingLostCallbackU (TrackingLost callback)

After track is lost (for example, when a person leaves the frame), this callback returns the best sample and face template. The best sample can be empty if the weak_tracks_in_tracking_callback configuration parameter is enabled. Processing results are passed in TrackingLostCallbackData structure (C++, Java, C#, Python).

This is the last callback called for the pair of <stream_id, track_id>. That is, after this callback no Tracking, MatchFound or TrackingLost callback for this stream_id can contain a sample with the same track_id.

For each pair of <stream_id, track_id>, mentioned in the Tracking callback, there is exactly only one TrackingLost callback, except for the tracks removed during the reset of video stream with VideoWorker.resetStream method.

The method returns track_id_threshold, an integer that means that all tracks removed during the resetStream had track_id < track_id_threshold, and all new tracks will have track_id >= track_id_threshold. After resetStream returns control, no callbacks related to the previous tracks will be called, including TrackingLost callback.

note

Exceptions in the callbacks will be caught and rethrown in the VideoWorker.checkExceptions member function. Therefore, you need to call the VideoWorker.checkExceptions method from time to time to check for errors.

caution

Do not call the methods that change the state of VideoWorker inside the callbacks to avoid a deadlock. That is, only the VideoWorker.getMethodName and VideoWorker.getStreamsCount member functions are safe for calling in callbacks.

Creating templates

note

To track faces and create biometric templates, you need to specify matching_thread=0 and processing_thread>0, and apply the Video Engine Standard license. To create VideoWorker for one stream, set the parameters streams_count=1, processing_threads_count=1 and matching_threads_count=0.

VideoWorker.TemplateCreatedCallbackU (TemplateCreated callback)

VideoWorker.TemplateCreatedCallbackU returns template generation results in TemplateCreatedCallbackData structure (C++, Java, C#, Python). This callback is called, whenever a template is created within the VideoWorker.

Template creation is enabled for all video streams by default. You can enable/disable template creation for a specific video stream using the VideoWorker.disableProcessingOnStream and VideoWorker.enableProcessingOnStream methods.

Estimation of age, gender, and emotions

note

To estimate age and gender, specify the parameter age_gender_estimation_threads_count > 0. To estimate emotions, set emotions_estimation_threads_count > 0.

The data about age, gender, and emotions is returned in VideoWorker.TrackingCallbackU. The data about emotions is constantly updated, and the data about age and gender is updated only if there is a sample of better quality.

By default estimation of age, gender, and emotions is enabled for all video streams.

To disable estimation of age, gender, and emotions on a specified video stream, use the following methods:

  • VideoWorker.disableAgeGenderEstimationOnStream (age and gender)
  • VideoWorker.disableEmotionsEstimationOnStream (emotions)

To enable estimation of age, gender, and emotions on a specified video stream again, use the following methods:

  • VideoWorker.enableAgeGenderEstimationOnStream (age and gender)
  • VideoWorker.enableEmotionsEstimationOnStream (emotions)

Liveness estimation

Active Liveness

This type of liveness estimation presupposes that a user needs to perform certain actions. For example, "turn the head", "blink", etc.

To enable active liveness, set the enable_active_liveness parameter in VideoWorker configuration file to 1. All faces for identification will then take several checks:

  • SMILE: smile. To use the SMILE check, specify the number of threads in the emotions_estimation_threads_count parameter of the VideoWorker object.
  • BLINK: blink
  • TURN_UP: turn your head up
  • TURN_DOWN: turn your head down
  • TURN_RIGHT: turn your head to the right
  • TURN_LEFT: turn your head to the left
  • PERSPECTIVE: face position check (move your face closer to the camera)

To enable/disable checks, open the configuration file active_liveness_estimator.xml and specify the values 0 (check disabled) or 1 (check enabled) for one or more checks: check_smile, check_perspective, check_blink, check_turn_up, check_turn_down, check_turn_right, check_turn_left.

Also, in the active_liveness_estimator.xml configuration file, you can configure the following parameters:

  • check_count: number of enabled checks. If the specified number is less than the number of enabled checks, the system will randomly select several checks to take, and if it is greater than the number of enabled checks, some checks will be repeated several times.
  • max_frames_wait: check waiting time (in frames). If the check has not started during this time, it is considered failed, and the status CHECK_FAIL is returned.
  • rotation_yaw_threshold: threshold for passing the check. To pass the check, you need to turn your head at the specified angle.
  • rotation_pitch_threshold: threshold for passing the check. To pass the check, you need to turn your head at the specified angle.
  • blinks_threshold: threshold for passing the check. 0.3 is default value, 0 is eye closed, 1 is eye open. Blinking is counted if the set value is equal to or less than 0.3. blinks_number: number of blinks required to pass the check. emotion_threshold_smile: threshold for checking smile. The smaller the specified value, the weaker you need to smile to pass the test. face_align_angle: before starting the check and between the checks, the face must be in a neutral position (frontal to the camera). The parameter value is the maximum allowable deviation of the head rotation angles to start the check.

The order of checks can be random (this mode is selected by default), or set during initialization (by creating a list of non-repeating checks).

See an example of setting a checklist in the video_recognition_demo sample (C++/C#/Java/Python).

The check status is returned in the active_liveness_result attribute of Tracking callback. This attribute contains the following fields:

  • verdict: status of the current check (the ActiveLiveness.Verdict object)
  • type: type of check (the ActiveLiveness.LivenessChecks object)
  • progress_level: check passing progress - a number in the range of [0,1]

The ActiveLiveness.Verdict object contains the following fields:

  • ALL_CHECKS_PASSED: all checks passed
  • CURRENT_CHECK_PASSED: current check passed
  • CHECK_FAIL: check failed
  • WAITING_FACE_ALIGN: waiting for neutral face position
  • NOT_COMPUTED: liveness is not estimated
  • IN_PROGRESS: check is in progress

Face identification

note

For face tracking, template creation and matching with the database, specify matching_thread>0 and processing_thread>0 in VideoWorker constructor parameters, and apply the Video Engine Extended license. To create VideoWorker for one stream, set the parameters streams_count=1, processing_threads_count=1 and matching_threads_count=1.

To set or change the database, use the VideoWorker.setDatabase method. This method can be called any time. Pass elements (a vector of base elements) and acceleration (type of search acceleration) as arguments.

Identification works the following way: a template extracted from the sample of the tracked face, is compared with templates from the database. If the distance to the closest database element is less than distance_threshold specified for this element, then a match is found.

VideoWorker.MatchFoundCallbackU callback is called after N consecutive matches with the elements that belong to the same person, and returns the matching result. The result is passed in the structure MatchFoundCallbackData (C++, Java, C#, Python). The N number can be set in the configuration file in the <consecutive_match_count_for_match_found_callback> parameter.

You can set the <not_found_match_found_callback> parameter to 1 to enable this callback after N consecutive mismatches (mismatch happens when the closest element is beyond its distance_threshold). In this case match_result of the first element in VideoWorker.MatchFoundCallbackData.search_results will be at zero distance, and the person_id and element_id identifiers will be equal to VideoWorker.MATCH_NOT_FOUND_ID.

This callback will be called after at least one Tracking callback and before a TrackingLost callback with the same stream_id and track_id.

The maximum number of elements returned in the VideoWorker.MatchFoundCallbackData.search_results is set in the configuration file in the search_k parameter. This number can be changed by the FacerecService.Config.overrideParameter method. Click here for the examples.

Short-time identification

note

Short time identification does not require a separate license. To use this function, specify at least one thread for template creation (processing_thread>0).

Short time identification (STI) is used to recognize a person who was in the frame some time ago, even if this person is not in the database, or identification is disabled. With STI enabled a person who was tracked, lost, and then tracked again within, for example, one minute, will be recognized as the same person.

To enable STI, use the enable_sti flag in the parameters of the VideoWorker constructor.

When a person enters the frame, the system creates a sample, used to extract a biometric template. Then, this template is compared with the face templates of people who left the frame no more than sti_outdate_time seconds ago (sti_outdate_time is set in the parameters of the VideoWorker constructor).

Matched tracks are grouped as sti_person. ID of this group (sti_person_id) is equal to the track_id value of the first element that formed the group sti_person.

When a specific group sti_person exceeds the specified period sti_outdate_time, VideoWorker.StiPersonOutdatedCallbackU is called. The result is passed in the structure StiPersonOutdatedCallbackData (C++, Java, C#, Python).

VideoWorker configuration parameters

Click here to see the list of parameters from the configuration file that can be changed with the FacerecService.Config.overrideParameter method
  • max_processed_width and max_processed_height: The parameter limits the size of the image that is submitted to the detector.
  • min_size and max_size: Minimum and maximum face size for detection (the size is defined for the image already downscaled under max_processed_width and max_processed_height). You can specify relative values for the REFA detector, then the absolute values will be increased relative to the image width.
  • min_neighbors: An integer detector parameter, used to reject false detections. You can change this parameter based on the situation. For example, you can increase the value if a large number of false detections is observed, or decrease the value if a large number of faces is not detected. If you aren't sure, we recommend you not change this parameter.
  • min_detection_period: A real number that means the minimum time (in seconds) between two runs of the detector. A zero value means ‘no restrictions’. The parameter is used to reduce the processor load. Large values increase the latency in detection of new faces.
  • max_detection_period: An integer that means the max time (in frames) between two runs of the detector. A zero value means ‘no restrictions’. For example, if you process a video offline, you can set the value to 1 so as not to miss a single person.
  • consecutive_match_count_for_match_found_callback: An integer that means the number of consecutive matches of a track with the same person from the database to consider this match valid (see also VideoWorker.MatchFoundCallbackU).
  • recognition_yaw_min_threshold, recognition_yaw_max_threshold, recognition_pitch_min_threshold and recognition_pitch_max_threshold: Real numbers that mean the restrictions on orientation of the face to be used for recognition.
  • min_tracking_face_size: A real number that means the minimum size of a tracked face.
  • max_tracking_face_size: A real number that means the maximum size of a tracked face. Non-positive value removes this limitation.
  • min_template_generation_face_size: A real number that means the minimum face size for template generation. Faces of smaller size will be marked as weak = true.
  • single_match_mode: An integer, 1 means that the single match mode is enabled, 0 means that the single match is disabled. If this mode is on, the track that is matched with the person from the database, will never generate a template and be matched with the database once again.
  • delayed_samples_in_tracking_callback: An integer, 1 means enabled, 0 means disabled. The detector, used to detect new faces, does not have enough time to work on all the frames. Therefore, some samples may be received with a delay. If the value 1 is selected, all delayed samples will be passed to TrackingCallbackFunc. Otherwise, the delayed samples appear only if the callback order is violated (i.e. a track sample must appear at least once in Tracking callback before calling TrackingLost callback for this track). To determine what frame a sample actually belongs to, use the RawSample.getFrameID method.
  • weak_tracks_in_tracking_callback: An integer, 1 means enabled, 0 means disabled. By default this flag is disabled. Samples with the flag of weak = true are not passed to the Tracking callback if at the time of their creation there were no samples with the same track_id (track_id = sample.getID()) with the flag of weak=false. If weak_tracks_in_tracking_callback is enabled, all samples are passed to the Tracking callback, so the TrackingLost callback can be called with best_quality_sample = NULL.
  • search_k: An integer, which means the maximum number of elements returned to the MatchFound callback, i.e. this is the k parameter, passed internally in the Recognizer.search method.
  • processing_queue_size_limit: An integer that means the max number of samples in the queue for template creation. The integer is used to limit the memory consumption in cases, where new faces appear faster than templates are created.
  • matching_queue_size_limit: An integer that means the max number of templates in the queue for matching with the database. The integer is used to limit the memory consumption in cases, where new templates appear faster than they are matched with the database (this can happen if you use a very large database).
  • recognizer_processing_less_memory_consumption: An integer that will be used as a value of the processing_less_memory_consumption flag in the FacerecService.createRecognizer method to create Recognizer object.
  • not_found_match_found_callback: An integer, 1 means enabled, 0 means disabled, calls VideoWorker.MatchFoundCallbackU after N consecutive mismatches.
  • depth_data_flag: An integer value, 1 turns on depth frame processing to confirm face liveness during face recognition, 0 turns off this mode. All overriden parameters with the depth_liveness. name prefix are forwarded to the DepthLivenessEstimator config with the prefix removed. So to override paramname param for depth liveness, override depth_liveness.paramname for VideoWorker. See FacerecService.Config.overrideParameter.
  • timestamp_distance_threshold_in_microsecs: Maximum allowable distance between the timestamps of a color image and a corresponding depth frame in microseconds; used if depth_data_flag is set.
  • max_frames_number_to_synch_depth: Maximum queue size when synchronizing the depth data; used if depth_data_flag is set.
  • max_frames_queue_size: Maximum queue size of frames used by the tracker; when depth_data_flag is set, the recommended value is max(3, rgbFPS / depthFPS).
  • offline_work_i_e_dont_use_time: An integer, 1 means enabled, 0 means disabled. Default value is disabled. When enabled, the check with max_detector_confirm_wait_time is not performed. Also max_occlusion_count_wait will be used istead of max_occlusion_time_wait.
  • max_occlusion_time_wait: A real number in seconds. When the tracker detects face occlusion, it holds the face position and tries to track it on new frames during this time period.
  • max_occlusion_count_wait: An integer. The only difference from max_occlusion_time_wait is that in this case time is measured in frames instead of seconds. Used only when offline_work_i_e_dont_use_time is enabled.
  • fda_max_bad_count_wait: An integer. When fda_tracker detects decline in the face quality, it tries to track that face with the general purpose tracker (instead of the fda method designed and tuned for faces) during at most fda_max_bad_count_wait frames.
  • base_angle: An integer: 0, 1, 2, or 3. Set camera orientation: 0 means standard (default), 1 means +90 degrees, 2 means -90 degrees, 3 means 180 degrees. When you change camera orientation, you need to set the new orientation value for this parameter, otherwise the detection quality will decrease.
  • fake_detections_cnt: An integer. Number of start positions to search a face using video_worker_fdatracker_fake_detector.xml. The start position is the fixed position of the face in the image. We can set the coordinates of start position if we are sure that there is a face in the given area of the image. The image with the marked start position goes to fake detector, that sends the image directly to the fitter. It is assumed that the image already has a face, which means that you can immediately proceed to the fitting of anthropometric points.
  • fake_detections_period: An integer. Each start position will be used once in fake_detections_period frames.
  • fake_rect_center_xN, fake_rect_center_yN, fake_rect_angleN, fake_rect_sizeN: Real numbers for parameters of start positions. N is from 0 to fake_detections_cnt – 1 inclusive. fake_rect_center_xN is X coordinate of a center relative to the image width. fake_rect_center_yN is Y coordinate of a center relative to the image height. fake_rect_angleN is a roll angle in degrees. fake_rect_sizeN is size relative to max(image width, image height).
  • downscale_rawsamples_to_preferred_size: An integer, 1 means enabled, 0 means disabled. The default value is enabled. When enabled, VideoWorker downscales each sample to the suitable size (see RawSample.downscaleToPreferredSize) to reduce memory consumption. However, it decreases the performance. We recommend that you disable downscale_rawsamples_to_preferred_size and use RawSample.downscaleToPreferredSize manually for RawSamples that you need to save or keep in RAM for a long time.
  • squeeze_match_found_callback_groups: An integer, 1 means enabled, 0 means disabled. Default value is disabled. When the track gets the first N consecutive matches with the same person from the database, all N matches will be reported (where N = value of consecutive_match_count_for_match_found_callback), i.e. there will be N consecutive VideoWorker.MatchFoundCallbackU calls. But if squeeze_match_found_callback_groups is enabled, only one match will be reported – the one with the minimum distance to the database.
  • debug_log_enabled: An integer, 1 means enabled, 0 means disabled. Default value is disabled. This can also be enabled by setting the environment variable FACE_SDK_VIDEOWORKER_DEBUG_LOG_ENABLED to 1. If enabled, VideoWorker logs the result of its work in std::clog.
  • need_stable_results: An integer, 1 means enabled, 0 means disabled. Default value is disabled. The parameter disables few optimizations that can produce slightly different results due to unstable order of multithreaded computations. Also it enables offline_work_i_e_dont_use_time and set max_detection_period value to 1 and disables the frame skip (disables the limit set by max_frames_queue_size).