- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tried the following networks using the demo frameworks in C++ and python.
facial-landmarks-35-adas-0002,
head-pose-estimation-adas-0001,
With inputs, preprocessing and post processing identical, they produce slightly different outputs.
When the networks are stacked in a pipeline like
face-detection-adas-0001, ==> facial-landmarks-35-adas-0002,
The difference in the landmark locations can be up to 3o pixels apart.
What is the reason for this discrepancy? My thinking is that at the backend both use the same compiled code?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Saeidn95,
Thank you for reaching out to us.
Which OpenVINO™ version do you install on your system? Could you also provide the inference code for replication and further investigation? If you are using the Open Model Zoo Demo, please let us know which Demo you used.
On the other hand, did you get the different outputs for both the C++ and Python inference?
Regards,
Megat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Megat, Here are more details
OpemVino Version: openvino_2023.3.0 for both C++ and Python
Device : CPU
OpenVino Open Model Zoo demos used:
For C++ I used the gaze_estimation_ demo: https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/gaze_estimation_demo
For Python I used the face_recognition_demo: https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/face_recognition_demo/python
Openvino Open Model Zoo Intel Network used for face detection: gaze-estimation-adas-0002 (FP32 version)
https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/gaze-estimation-adas-0002
For both demos I passed the above network for face detection. Both demos do exactly the same pre-processing (i.e resizing the same image to rows=384 x cols=672. However, I get the different outputs for both the C++ and Python inferences. The raw x, y position and width and height of the rectangle outputted are different by a few pixels in x and y directions. I inserted printouts in the codes to inspect the values of x, y position and width and height. Even the confidence values to not exactly match after the 3rd decimal places.
I think this should be enough for you to inspect what is going on. However to take this a bit further the following is in order.
When the output of this network is passed to the following Openvino Open Model Zoo Intel Network. The x y locations of 35 landmakrs produced by this network on the c++ and and python inferences are different by as much as 30 pixels.
facial-landmarks-35-adas-0002: https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/facial-landmarks-35-adas-0002
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Saeidn95,
For your information, I tried replicating your issue on both demos.
For the C++ Gaze Estimation Demo (Gaze Estimation Demo Supported Models), below are the models that I used:
- Gaze Estimation model - gaze-estimation-adas-0002
- Face Detection model - face-detection-adas-0001
- Head Pose Estimation model - head-pose-estimation-adas-0001
- Facial Landmarks Estimation model - facial-landmarks-35-adas-0002
- Open/Closed Eye Estimation model - open-closed-eye-0001
And for the Face Recognition Python* Demo (Face Recognition Python Demo Supported Models), here are the models that I used:
- Face Detection model - face-detection-adas-0001
- Facial Landmarks Detection model - landmarks-regression-retail-0009
- Face Reidentification model - face-reidentification-retail-0095
You mentioned that you used gaze-estimation-adas-0002 for Face Detection. However, face-detection-adas-0001 and face-detection-retail-0004 are the only Face Detection models that are supported on both the Gaze Estimation C++ Demo and Face Recognition Python Demo. Are you perhaps referring to the face-detection-adas-0001 model for face detection?
On the other hand, you also mentioned that the facial-landmarks-35-adas-0002 model produced different results on the 35 landmarks location in both the C++ and the Python demo. On my end, I was unable to run the facial-landmarks-35-adas-0002 model on the Face Recognition Python Demo as it is not supported and the Python demo only supports the landmarks-regression-retail-0009 model which only provides 5 landmarks.
If you have modified the inference code for both demos, please provide us with the full inference code files for both the C++ Gaze Estimation Demo and Face Recognition Python* Demo that includes the printouts that output the value of x, y and width, height for further investigation.
Regards,
Megat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Megat
Let us only focus on the outputs from "face-detection-adas-0001" and "facial-landmarks-35-adas-0002" steps on both demos C++ "gaze_estimation_demo" and Python "/face_recognition_demo"
1. Face Detection Step: Face Detection model - "face-detection-adas-0001"
The output from this network are different on C++ Gaze Estimation Demo and Python Face Recognition Python* Demo
To see the difference in the output:
----. for C code I inserted the following in between in line 99 in the code "demos/gaze_estimation_demo/cpp/src/face_detector.cpp"
std::cout << "x: " << x << " y: " << y << " w: " << width << " h: " << height << "\n";
---- for the python code I inserted the following in line 100 in code "face_recognition_demo/python/face_detector.py"
print( "x: ", self.position[0], " y: ", self.position[1], " w: ", self.size[0], " h: ", self.size[1])
2. Land Mark Detection Step: Landmark Detection model - "facial-landmarks-35-adas-0002"
he output from this network are difference on C++ Gaze Estimation Demo and Python Face Recognition Python* Demo
To see the difference in the output:
----. for C code I inserted the following in between lines 51 and 52 in the code "demos/gaze_estimation_demo/cpp/src/landmarks_estimator.cpp"
std::cout << "x: " << x << " y: " << y << "\n";
---- for the python code I inserted the following in line 57 in code "face_recognition_demo/python/landmarks_detector.py"
self.results_sc = deepcopy(results)
for k in range(len(results)):
for i in range(len(results[k])):
results_sc[k][i][0] = results_sc[k][i][0] * self.rois[k].size[0] + self.rois[k].position[0]
results_sc[k][i][1] = results_sc[k][i][1] * self.rois[k].size[1]+ self.rois[k].position[1]
print("x: ", results_sc[k][i][0], " y: ", results_sc[k][i][1])
Furthermore, to make the python code work for "facial-landmarks-35-adas-0002" I had to modify the following:
line 24:
POINTS_NUMBER_1 = 35
line 38:
if not np.array_equal([1, self.POINTS_NUMBER * 2], output_shape):
raise RuntimeError("The model expects output shape {}, got {}".format(
[1, self.POINTS_NUMBER * 2], output_shape))
I also need to add
from copy import deepcopy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Saeidn95,
Thank you for sharing the details.
For your information, I was able to replicate the issue you observed with the Face Detection step. But, I received some errors in the Python code and had to change a few lines for it to work. I share the results here:
C++
x: 807.676 y: 117.246 w: 256.247 h: 399.059
Python
x: 807.89813 y: 116.80127 w: 255.90338 h: 399.40616
C++:
Python:
Python Code:
On the other hand, for Landmark Detection I was able to show the landmarks in the C++ code. However, the Python code resulted in the error: "AttributeError: 'LandmarksDetector' object has no attribute 'rois'". Are there any other modifications that need to be made to the Python demo to successfully run the "facial-landmarks-35-adas-0002" model? I show my results and my full landmarks_detector.py below:
C++:
Python:
Python (landmarks_detector.py) code:
import numpy as np
from utils import cut_rois, resize_input
from ie_module import Module
from copy import deepcopy
class LandmarksDetector(Module):
POINTS_NUMBER = 35
def __init__(self, core, model):
super(LandmarksDetector, self).__init__(core, model, 'Landmarks Detection')
if len(self.model.inputs) != 1:
raise RuntimeError("The model expects 1 input layer")
if len(self.model.outputs) != 1:
raise RuntimeError("The model expects 1 output layer")
self.input_tensor_name = self.model.inputs[0].get_any_name()
self.input_shape = self.model.inputs[0].shape
self.nchw_layout = self.input_shape[1] == 3
output_shape = self.model.outputs[0].shape
if not np.array_equal([1, self.POINTS_NUMBER * 2], output_shape):
raise RuntimeError("The model expects output shape {}, got {}".format(
[1, self.POINTS_NUMBER * 2], output_shape))
def preprocess(self, frame, rois):
inputs = cut_rois(frame, rois)
inputs = [resize_input(input, self.input_shape, self.nchw_layout) for input in inputs]
return inputs
def enqueue(self, input):
return super(LandmarksDetector, self).enqueue({self.input_tensor_name: input})
def start_async(self, frame, rois):
inputs = self.preprocess(frame, rois)
for input in inputs:
self.enqueue(input)
def postprocess(self):
results = [out.reshape((-1, 2)).astype(np.float64) for out in self.get_outputs()]
self.results_sc = deepcopy(results)
for k in range(len(results)):
for i in range(len(results[k])):
self.results_sc[k][i][0] = self.results_sc[k][i][0] * self.rois[k].size[0] + self.rois[k].position[0]
self.results_sc[k][i][1] = self.results_sc[k][i][1] * self.rois[k].size[1]+ self.rois[k].position[1]
print("x: ", self.results_sc[k][i][0], " y: ", self.results_sc[k][i][1])
return results
Regards,
Megat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Megat,
1. Glad to know that you can replicated the discrepancy in the output of e Face Detection step from C++ and Python on "face-detection-adas-0001" model. The discrepancy can be even larger to a few pixels on some images.
2. For the Land mark detection step for the model "facial-landmarks-35-adas-0002" you will also need to add one more line to the code. Apologies for not mentioning it before. On line 51 in function "start_async(self, frame, rois) code "face_recognition_demo/python/landmarks_detector.py" please insert:
self.rois = rois
This will make rois accessible to the rest of the code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Saeidn95,
For your information, we have escalated to the relevant team regarding the Face Detection issue for further investigation.
On the other hand, after inserting self.rois = rois, I was able to get the first landmark however, I received an error before getting the second landmark. The error seems to happen in the face_identifier.py. Did you encountered such error before? I share the result below:
Regards,
Megat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Megat,
I am glad that you have reached the same point that I reached.
As to receiving an error before getting the second landmark. Yes you will see an error in the face_identifier.py. To rectify that error you need make a slight modification to the code in file as "demos/face_recognition_demo/python/face_recognition_demo.py". Please insert the following two lines before line 158.
landmawks = self.landmarks_detector.results_sc
landmarks = [lm[np.ix_([1, 3, 4, 8, 9], [0, 1])] for lm in landmawks]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Saeidn95,
Thank you for the guide you provided.
I was able to run the Python demo for the Landmarks detection without any errors. However, my results only show one landmark instead of 35. The landmarks also did not appear in the image results. It seems like the code was unable to draw all 35 landmarks. I share the result here:
To help ease this investigation, is it possible for you to provide all the Python files included in the ..\open_model_zoo\demos\face_recognition_demo\python folder? If you would like to send it to me privately you can email me the files at megatx.muhammad.firdaus.bin.sahrir@intel.com.
Regards,
Megat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Megat,
I think you have everything you need to establish the difference in the outputs of two models "face-detection-adas-0001" and "facial-landmarks-35-adas-0002". You have already established that they produce different outputs by the insertion of the print statements in both c++ and python codes. That is where we need to focus the investigation, and why two code bases produce different outputs on the same two models.
The reason you only get five points and not 35 is because the "face_identifier.py" in the "face_recognition_demo.py" can only accept five point. The last line of code I sent you; "landmarks = [lm[np.ix_([1, 3, 4, 8, 9], [0, 1])] for lm in landmawks]" select five relevant key points from 35 for the "face_identifier.py" to use. If you want all the 35 points displayed on the you can make a simple change to the code in in line 158 in "face_recognition_demo.py" . This change ensures that all the 35 points are pass on for display, and at the same time "face_identifier.py" see only the five relevant key points.
landmawks = self.landmarks_detector.results_sc
landmarks_five = [lm[np.ix_([1, 3, 4, 8, 9], [0, 1])] for lm in landmawks]
face_identities, unknowns = self.face_identifier.infer((frame, rois, landmarks_five))
In the "/face_recognition_demo.py" on the openVino model zoo the sample land mark model is "landmarks-regression-retail-0009.xml". This model only produces 5 key points, and no selection is needed.
The reason you do not see these five points, is most likely is because they are off the scale. That is really the topic of the discussion, that we have the python code does not produce the points in the right place.
It is hard for me to share my code with you because I have made so much changes on top of it that it will be useless to you. But you have everything you need to find out what is going on.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Saeidn95,
We have tried comparing the results on our end from the modified demos.
From our investigation, we did not encounter huge differences and the pixel differences were minor and did not impact the accuracy of our result. We show the result here:
Face Detection model
C++
x: 807.676 y: 117.246 w: 256.247 h: 399.059
Python
x: 807.89813 y: 116.80127 w: 255.90338 h: 399.40616
Landmarks Detection model
C++
0: x: 903 y: 283
1: x: 856 y: 283
2: x: 981 y: 282
3: x: 1027 y: 279
4: x: 944 y: 346
5: x: 944 y: 373
6: x: 906 y: 363
7: x: 979 y: 361
8: x: 892 y: 420
9: x: 992 y: 418
10: x: 943 y: 402
11: x: 943 y: 447
12: x: 831 y: 256
13: x: 870 y: 234
14: x: 915 y: 249
15: x: 969 y: 248
16: x: 1013 y: 231
17: x: 1051 y: 252
18: x: 804 y: 285
19: x: 805 y: 322
20: x: 810 y: 358
21: x: 817 y: 393
22: x: 828 y: 427
23: x: 847 y: 459
24: x: 872 y: 486
25: x: 904 y: 506
26: x: 944 y: 513
27: x: 981 y: 505
28: x: 1011 y: 484
29: x: 1034 y: 457
30: x: 1052 y: 425
31: x: 1062 y: 391
32: x: 1069 y: 356
33: x: 1073 y: 319
34: x: 1074 y: 281
Python
0 : x: 906.0463315558754 y: 282.3578870739293
1 : x: 861.145753608238 y: 280.41236441835645
2 : x: 977.3058866649917 y: 281.0067285298137
3 : x: 1020.4552166107824 y: 278.4261315171025
4 : x: 945.6485149229011 y: 348.75262920354726
5 : x: 944.9155824425288 y: 380.5634290122107
6 : x: 908.2674060468216 y: 365.3378792597796
7 : x: 978.3399978426642 y: 364.65954855551536
8 : x: 893.7569865256628 y: 422.0812268902664
9 : x: 990.7621368931395 y: 418.60144540603505
10 : x: 944.522629316325 y: 409.4526732340455
11 : x: 944.7058975183863 y: 439.08888338941324
12 : x: 838.6659738548715 y: 250.02883242280222
13 : x: 876.46855749733 y: 225.04719686022145
14 : x: 918.066861004485 y: 243.57602577414946
15 : x: 967.81767015902 y: 242.74457307654666
16 : x: 1007.6137954169426 y: 223.89676966871775
17 : x: 1041.3190244318175 y: 247.90205745954154
18 : x: 810.6795244533491 y: 286.8375746126403
19 : x: 811.4277329382649 y: 328.7259260719293
20 : x: 815.1902236011733 y: 369.0622781247512
21 : x: 821.0280604516015 y: 409.1805689086759
22 : x: 831.1949880634147 y: 447.1738505738904
23 : x: 850.8629392922285 y: 479.66404880677874
24 : x: 877.2758198979009 y: 503.0393407546362
25 : x: 908.3212217860946 y: 518.3458455820364
26 : x: 945.2001498278696 y: 522.6027903104841
27 : x: 978.2252096425982 y: 515.5027811010368
28 : x: 1005.0076261606773 y: 498.1616400674975
29 : x: 1027.358926975874 y: 473.89989415534365
30 : x: 1043.546255808189 y: 441.688973135766
31 : x: 1052.1247442548665 y: 405.3227598034864
32 : x: 1057.2115244030829 y: 366.8379710496956
33 : x: 1060.253744983529 y: 328.09794247042737
34 : x: 1060.7353502434635 y: 286.69371988101193
We ran the inference using the sample image attached. Could you please help to try and run your code with our example image and see if you get the same result as ours?
If you get the same result as ours, please help to provide us with the sample image/video that you used that gave you the 30-pixel difference result, so that we can replicate it.
Regards,
Megat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Megat,
Thanks for the due diligence.
Yes I am I getting similar results on your test image.
In your outputs for point 33 you get:
C++ 33: x: 1073 y: 319
Python-: 33 x: 1060.253744983529 y: 328.09794247042737
This is a difference of dx = 13 and dy = 9.
For some images I get a difference as large as 30 pixels.
Five points for your consideration as you improve your tools:
a. I was just reporting my observations that I thought software developers in Openvino team in Intel would be interested to know in their development efforts. What you do with it is your call of course.
b. It matters in a chain of networks in gaze estimation as the inaccuracies propagate and amplify, where the pose estimation can be off by a few degrees.
c. It is not so much the difference but where the difference come from? my understanding is that the backend is the same for both C++ and Python.
d. How do you measure accuracy? Is dx = 13 is accurate enough? How far dx/dv has to deviate before it is called inaccurate?
e. Is there a upper bound on the difference? As I said I have seen difference as high as 30 pixels, before I reported my observations to you. Unfortunately, I cannot member which image I used to send it to you to replicate, but they were grayscale for use in driver-monitoring system. You can try a few different grayscale images that are available to you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Saeidn95,
Thank you for highlighting your findings to us.
We will convey these information to the appropriate teams for further clarification regarding this issue. We will also test out other images including the grayscale images to see if we observed any huge differences.
Regards,
Megat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Saeidn95,
Thank you for the feedback and findings you provided.
We will continue to improve the OpenVINO™ Toolkit further in future releases and we appreciate your support. Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored.
Regards,
Megat

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page