Version: 1.1.0

Benchmarks

Load Testing

Load testing helps evaluate the quality and speed of Image API operation under a certain load at a given time interval. A test image used is 438 KB (1024х1024 px) jpeg image.

Note: when testing larger images, the speed of request execution decreases.

Specification of the test system:

CPU: AMD Ryzen 9 5950X 16-Core (32 threads)
GPU: GeForce GTX 1080 Ti
RAM: 120GB

Load testing parameters:

RPS: number of requests per second
Number of replicas
Request time (ms) AVG: average time on 1 request in ms

The results of Image API load testing are given below:

Service	RPS	Replicas	Request time (ms) AVG
face-detector-face-fitter	1	1	77.25
face-detector-face-fitter	112	88	583.93
age-estimator	1	1	34.79
age-estimator	192	32	634.21
gender-estimator	1	1	35.05
gender-estimator	176	48	620.79
verify-matcher	1	1	4.08
verify-matcher	64	20	23.34
quality-assessment-estimator	1	1	74.08
quality-assessment-estimator	96	80	632.52
face-detector-template-extractor (GPU)	1	1	105.80
face-detector-template-extractor (GPU)	8	1	674.42
face-detector-template-extractor (CPU)	1	1	481.53
face-detector-template-extractor (CPU)	4	18	564.23
body-detector	1	1	171.94
body-detector	16	32	534.46
emotion-estimator	1	1	49.17
emotion-estimator	96	32	653.64
mask-estimator	1	1	35.01
mask-estimator	192	96	686.04

Accuracy Testing

Accuracy of age-estimator, gender-estimator and emotion-estimator

Service	Accuracy
age-estimator	+/- 3.95 years
gender-estimator	95%
emotion-estimator	80%

Accuracy of mask-estimator

To calculate operation accuracy, the following metrics are used:

Precision: the metric shows how accurate the service is and represents the number of true positive results relative to all positive results.
Recall: the metric indicates how completely the service covers the correct results, and represents the number of correct positive results in relation to all the results that should be positive.
F1 score is one of the ways to combine precision and recall metrics into an aggregate accuracy criterion. F1 score reaches its maximum at recall and precision equal to one, and is close to zero if one of the arguments is close to zero. F1 score is a harmonic mean (with a multiplier of 2, so that in the case of precision = 1 and recall = 1 get F1 = 1).

Metric	Value
Precision	0.9967532468
Recall	0.9903225806
F1 score	0.9935275081

Accuracy of face-detector-liveness-estimator

To calculate operation accuracy, the following metrics are used:

APCER: the metric shows the proportion of validation dataset attacks that were classified as real biometric presentations.
BPCER: the metric shows the proportion of real biometric presentations classified as attacks.

Image Type	Metric	Value
real face	BPCER	0.29981
photo	APCER	0.04911
photo without background	APCER	0.12
replay attack	APCER	0.01339
2D mask	APCER	0.02888
3D mask	APCER	0.01333

Note: average request time equals 0.3 s.

Load Testing​

Accuracy Testing​

Accuracy of age-estimator, gender-estimator and emotion-estimator​

Accuracy of mask-estimator​

Accuracy of face-detector-liveness-estimator​

Load Testing

Accuracy Testing

Accuracy of age-estimator, gender-estimator and emotion-estimator

Accuracy of mask-estimator

Accuracy of face-detector-liveness-estimator