Difference in the networks outputs from the C++ and Python platforms

saeidn95 · ‎02-22-2024

I have tried the following networks using the demo frameworks in C++ and python.

face-detection-adas-0001,

facial-landmarks-35-adas-0002,

head-pose-estimation-adas-0001,

gaze-estimation-adas-0002,

With inputs, preprocessing and post processing identical, they produce slightly different outputs.

When the networks are stacked in a pipeline like

face-detection-adas-0001, ==> facial-landmarks-35-adas-0002,

The difference in the landmark locations can be up to 3o pixels apart.

What is the reason for this discrepancy? My thinking is that at the backend both use the same compiled code?

Megat_Intel · ‎02-23-2024

Hi Saeidn95,

Thank you for reaching out to us.

Which OpenVINO™ version do you install on your system? Could you also provide the inference code for replication and further investigation? If you are using the Open Model Zoo Demo, please let us know which Demo you used.

On the other hand, did you get the different outputs for both the C++ and Python inference?

Regards,

Megat

saeidn95 · ‎02-23-2024

Thanks Megat, Here are more details

OpemVino Version: openvino_2023.3.0 for both C++ and Python

Device : CPU

OpenVino Open Model Zoo demos used:

For C++ I used the gaze_estimation_ demo: https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/gaze_estimation_demo

For Python I used the face_recognition_demo: https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/face_recognition_demo/python

Openvino Open Model Zoo Intel Network used for face detection: gaze-estimation-adas-0002 (FP32 version)

https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/gaze-estimation-adas-0002

For both demos I passed the above network for face detection. Both demos do exactly the same pre-processing (i.e resizing the same image to rows=384 x cols=672. However, I get the different outputs for both the C++ and Python inferences. The raw x, y position and width and height of the rectangle outputted are different by a few pixels in x and y directions. I inserted printouts in the codes to inspect the values of x, y position and width and height. Even the confidence values to not exactly match after the 3rd decimal places.

I think this should be enough for you to inspect what is going on. However to take this a bit further the following is in order.

When the output of this network is passed to the following Openvino Open Model Zoo Intel Network. The x y locations of 35 landmakrs produced by this network on the c++ and and python inferences are different by as much as 30 pixels.

facial-landmarks-35-adas-0002: https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/facial-landmarks-35-adas-0002

Megat_Intel · ‎02-26-2024

Hi Saeidn95,

For your information, I tried replicating your issue on both demos.

For the C++ Gaze Estimation Demo (Gaze Estimation Demo Supported Models), below are the models that I used:

Gaze Estimation model - gaze-estimation-adas-0002
Face Detection model - face-detection-adas-0001
Head Pose Estimation model - head-pose-estimation-adas-0001
Facial Landmarks Estimation model - facial-landmarks-35-adas-0002
Open/Closed Eye Estimation model - open-closed-eye-0001

And for the Face Recognition Python* Demo (Face Recognition Python Demo Supported Models), here are the models that I used:

Face Detection model - face-detection-adas-0001
Facial Landmarks Detection model - landmarks-regression-retail-0009
Face Reidentification model - face-reidentification-retail-0095

You mentioned that you used gaze-estimation-adas-0002 for Face Detection. However, face-detection-adas-0001 and face-detection-retail-0004 are the only Face Detection models that are supported on both the Gaze Estimation C++ Demo and Face Recognition Python Demo. Are you perhaps referring to the face-detection-adas-0001 model for face detection?

On the other hand, you also mentioned that the facial-landmarks-35-adas-0002 model produced different results on the 35 landmarks location in both the C++ and the Python demo. On my end, I was unable to run the facial-landmarks-35-adas-0002 model on the Face Recognition Python Demo as it is not supported and the Python demo only supports the landmarks-regression-retail-0009 model which only provides 5 landmarks.

If you have modified the inference code for both demos, please provide us with the full inference code files for both the C++ Gaze Estimation Demo and Face Recognition Python* Demo that includes the printouts that output the value of x, y and width, height for further investigation.

Regards,

Megat

saeidn95 · ‎02-26-2024

Thanks Megat

Let us only focus on the outputs from "face-detection-adas-0001" and "facial-landmarks-35-adas-0002" steps on both demos C++ "gaze_estimation_demo" and Python "/face_recognition_demo"

1. Face Detection Step: Face Detection model - "face-detection-adas-0001"

The output from this network are different on C++ Gaze Estimation Demo and Python Face Recognition Python* Demo

To see the difference in the output:

----. for C code I inserted the following in between in line 99 in the code "demos/gaze_estimation_demo/cpp/src/face_detector.cpp"

std::cout <<  "x: " <<  x << " y: " << y << " w: " << width << " h: " << height << "\n";

---- for the python code I inserted the following in line 100 in code "face_recognition_demo/python/face_detector.py"

print( "x: ",  self.position[0], " y: ", self.position[1],  " w: ",  self.size[0], " h: ",  self.size[1])

2. Land Mark Detection Step: Landmark Detection model - "facial-landmarks-35-adas-0002"

he output from this network are difference on C++ Gaze Estimation Demo and Python Face Recognition Python* Demo

To see the difference in the output:

----. for C code I inserted the following in between lines 51 and 52 in the code "demos/gaze_estimation_demo/cpp/src/landmarks_estimator.cpp"

std::cout <<  "x: " <<  x << " y: " <<  y << "\n";

---- for the python code I inserted the following in line 57 in code "face_recognition_demo/python/landmarks_detector.py"

self.results_sc = deepcopy(results)
for k in range(len(results)):
   for i in range(len(results[k])):
     results_sc[k][i][0] = results_sc[k][i][0] * self.rois[k].size[0] + self.rois[k].position[0]
     results_sc[k][i][1] = results_sc[k][i][1] * self.rois[k].size[1]+ self.rois[k].position[1]
   print("x: ",  results_sc[k][i][0],  " y: ", results_sc[k][i][1])

Furthermore, to make the python code work for "facial-landmarks-35-adas-0002" I had to modify the following:

line 24:

POINTS_NUMBER_1 = 35

line 38:

if not np.array_equal([1, self.POINTS_NUMBER * 2], output_shape):
    raise RuntimeError("The model expects output shape {}, got {}".format(
        [1, self.POINTS_NUMBER * 2], output_shape))

I also need to add

from copy import deepcopy

Megat_Intel · ‎02-27-2024

Hi Saeidn95,

Thank you for sharing the details.

For your information, I was able to replicate the issue you observed with the Face Detection step. But, I received some errors in the Python code and had to change a few lines for it to work. I share the results here:

C++

x: 807.676 y: 117.246 w: 256.247 h: 399.059

Python

x: 807.89813 y: 116.80127 w: 255.90338 h: 399.40616

C++:

Python:

Python Code:

On the other hand, for Landmark Detection I was able to show the landmarks in the C++ code. However, the Python code resulted in the error: "AttributeError: 'LandmarksDetector' object has no attribute 'rois'". Are there any other modifications that need to be made to the Python demo to successfully run the "facial-landmarks-35-adas-0002" model? I show my results and my full landmarks_detector.py below:

C++:

Python:

Python (landmarks_detector.py) code:

import numpy as np

from utils import cut_rois, resize_input
from ie_module import Module
from copy import deepcopy

class LandmarksDetector(Module):
    POINTS_NUMBER = 35

    def __init__(self, core, model):
        super(LandmarksDetector, self).__init__(core, model, 'Landmarks Detection')

        if len(self.model.inputs) != 1:
            raise RuntimeError("The model expects 1 input layer")
        if len(self.model.outputs) != 1:
            raise RuntimeError("The model expects 1 output layer")

        self.input_tensor_name = self.model.inputs[0].get_any_name()
        self.input_shape = self.model.inputs[0].shape
        self.nchw_layout = self.input_shape[1] == 3
        output_shape = self.model.outputs[0].shape
        if not np.array_equal([1, self.POINTS_NUMBER * 2], output_shape):
            raise RuntimeError("The model expects output shape {}, got {}".format(
                [1, self.POINTS_NUMBER * 2], output_shape))

    def preprocess(self, frame, rois):
        inputs = cut_rois(frame, rois)
        inputs = [resize_input(input, self.input_shape, self.nchw_layout) for input in inputs]
        return inputs

    def enqueue(self, input):
        return super(LandmarksDetector, self).enqueue({self.input_tensor_name: input})

    def start_async(self, frame, rois):
        inputs = self.preprocess(frame, rois)
        for input in inputs:
            self.enqueue(input)

    def postprocess(self):
        results = [out.reshape((-1, 2)).astype(np.float64) for out in self.get_outputs()]
        self.results_sc = deepcopy(results)
        for k in range(len(results)):
            for i in range(len(results[k])):
                self.results_sc[k][i][0] = self.results_sc[k][i][0] * self.rois[k].size[0] + self.rois[k].position[0]
                self.results_sc[k][i][1] = self.results_sc[k][i][1] * self.rois[k].size[1]+ self.rois[k].position[1]
            print("x: ",  self.results_sc[k][i][0],  " y: ", self.results_sc[k][i][1])
        return results

Regards,

Megat

saeidn95 · ‎02-27-2024

Dear Megat,

1. Glad to know that you can replicated the discrepancy in the output of e Face Detection step from C++ and Python on "face-detection-adas-0001" model. The discrepancy can be even larger to a few pixels on some images.

2. For the Land mark detection step for the model "facial-landmarks-35-adas-0002" you will also need to add one more line to the code. Apologies for not mentioning it before. On line 51 in function "start_async(self, frame, rois) code "face_recognition_demo/python/landmarks_detector.py" please insert:

self.rois = rois

This will make rois accessible to the rest of the code.

Megat_Intel · ‎02-27-2024

Hi Saeidn95,

For your information, we have escalated to the relevant team regarding the Face Detection issue for further investigation.

On the other hand, after inserting self.rois = rois, I was able to get the first landmark however, I received an error before getting the second landmark. The error seems to happen in the face_identifier.py. Did you encountered such error before? I share the result below:

Regards,

Megat

saeidn95 · ‎02-27-2024

Thanks Megat,

I am glad that you have reached the same point that I reached.

As to receiving an error before getting the second landmark. Yes you will see an error in the face_identifier.py. To rectify that error you need make a slight modification to the code in file as "demos/face_recognition_demo/python/face_recognition_demo.py". Please insert the following two lines before line 158.

landmawks = self.landmarks_detector.results_sc
landmarks = [lm[np.ix_([1, 3, 4, 8, 9], [0, 1])] for lm in landmawks]

Megat_Intel · ‎02-28-2024

Hi Saeidn95,

Thank you for the guide you provided.

I was able to run the Python demo for the Landmarks detection without any errors. However, my results only show one landmark instead of 35. The landmarks also did not appear in the image results. It seems like the code was unable to draw all 35 landmarks. I share the result here:

To help ease this investigation, is it possible for you to provide all the Python files included in the ..\open_model_zoo\demos\face_recognition_demo\python folder? If you would like to send it to me privately you can email me the files at megatx.muhammad.firdaus.bin.sahrir@intel.com.

Regards,

Megat

saeidn95 · ‎02-28-2024

Hello Megat,

I think you have everything you need to establish the difference in the outputs of two models "face-detection-adas-0001" and "facial-landmarks-35-adas-0002". You have already established that they produce different outputs by the insertion of the print statements in both c++ and python codes. That is where we need to focus the investigation, and why two code bases produce different outputs on the same two models.

The reason you only get five points and not 35 is because the "face_identifier.py" in the "face_recognition_demo.py" can only accept five point. The last line of code I sent you; "landmarks = [lm[np.ix_([1, 3, 4, 8, 9], [0, 1])] for lm in landmawks]" select five relevant key points from 35 for the "face_identifier.py" to use. If you want all the 35 points displayed on the you can make a simple change to the code in in line 158 in "face_recognition_demo.py" . This change ensures that all the 35 points are pass on for display, and at the same time "face_identifier.py" see only the five relevant key points.

landmawks = self.landmarks_detector.results_sc
landmarks_five = [lm[np.ix_([1, 3, 4, 8, 9], [0, 1])] for lm in landmawks]
face_identities, unknowns = self.face_identifier.infer((frame, rois, landmarks_five))

In the "/face_recognition_demo.py" on the openVino model zoo the sample land mark model is "landmarks-regression-retail-0009.xml". This model only produces 5 key points, and no selection is needed.

The reason you do not see these five points, is most likely is because they are off the scale. That is really the topic of the discussion, that we have the python code does not produce the points in the right place.

It is hard for me to share my code with you because I have made so much changes on top of it that it will be useless to you. But you have everything you need to find out what is going on.

Megat_Intel · ‎02-29-2024

Hi Saeidn95,

We have tried comparing the results on our end from the modified demos.

From our investigation, we did not encounter huge differences and the pixel differences were minor and did not impact the accuracy of our result. We show the result here:

Face Detection model

C++

x: 807.676 y: 117.246 w: 256.247 h: 399.059

Python

x: 807.89813 y: 116.80127 w: 255.90338 h: 399.40616

Landmarks Detection model

C++

0: x: 903 y: 283

1: x: 856 y: 283

2: x: 981 y: 282

3: x: 1027 y: 279

4: x: 944 y: 346

5: x: 944 y: 373

6: x: 906 y: 363

7: x: 979 y: 361

8: x: 892 y: 420

9: x: 992 y: 418

10: x: 943 y: 402

11: x: 943 y: 447

12: x: 831 y: 256

13: x: 870 y: 234

14: x: 915 y: 249

15: x: 969 y: 248

16: x: 1013 y: 231

17: x: 1051 y: 252

18: x: 804 y: 285

19: x: 805 y: 322

20: x: 810 y: 358

21: x: 817 y: 393

22: x: 828 y: 427

23: x: 847 y: 459

24: x: 872 y: 486

25: x: 904 y: 506

26: x: 944 y: 513

27: x: 981 y: 505

28: x: 1011 y: 484

29: x: 1034 y: 457

30: x: 1052 y: 425

31: x: 1062 y: 391

32: x: 1069 y: 356

33: x: 1073 y: 319

34: x: 1074 y: 281

Python

0 : x: 906.0463315558754 y: 282.3578870739293

1 : x: 861.145753608238 y: 280.41236441835645

2 : x: 977.3058866649917 y: 281.0067285298137

3 : x: 1020.4552166107824 y: 278.4261315171025

4 : x: 945.6485149229011 y: 348.75262920354726

5 : x: 944.9155824425288 y: 380.5634290122107

6 : x: 908.2674060468216 y: 365.3378792597796

7 : x: 978.3399978426642 y: 364.65954855551536

8 : x: 893.7569865256628 y: 422.0812268902664

9 : x: 990.7621368931395 y: 418.60144540603505

10 : x: 944.522629316325 y: 409.4526732340455

11 : x: 944.7058975183863 y: 439.08888338941324

12 : x: 838.6659738548715 y: 250.02883242280222

13 : x: 876.46855749733 y: 225.04719686022145

14 : x: 918.066861004485 y: 243.57602577414946

15 : x: 967.81767015902 y: 242.74457307654666

16 : x: 1007.6137954169426 y: 223.89676966871775

17 : x: 1041.3190244318175 y: 247.90205745954154

18 : x: 810.6795244533491 y: 286.8375746126403

19 : x: 811.4277329382649 y: 328.7259260719293

20 : x: 815.1902236011733 y: 369.0622781247512

21 : x: 821.0280604516015 y: 409.1805689086759

22 : x: 831.1949880634147 y: 447.1738505738904

23 : x: 850.8629392922285 y: 479.66404880677874

24 : x: 877.2758198979009 y: 503.0393407546362

25 : x: 908.3212217860946 y: 518.3458455820364

26 : x: 945.2001498278696 y: 522.6027903104841

27 : x: 978.2252096425982 y: 515.5027811010368

28 : x: 1005.0076261606773 y: 498.1616400674975

29 : x: 1027.358926975874 y: 473.89989415534365

30 : x: 1043.546255808189 y: 441.688973135766

31 : x: 1052.1247442548665 y: 405.3227598034864

32 : x: 1057.2115244030829 y: 366.8379710496956

33 : x: 1060.253744983529 y: 328.09794247042737

34 : x: 1060.7353502434635 y: 286.69371988101193

We ran the inference using the sample image attached. Could you please help to try and run your code with our example image and see if you get the same result as ours?

If you get the same result as ours, please help to provide us with the sample image/video that you used that gave you the 30-pixel difference result, so that we can replicate it.

Regards,

Megat

saeidn95 · ‎02-29-2024

Hello Megat,

Thanks for the due diligence.

Yes I am I getting similar results on your test image.

In your outputs for point 33 you get:

C++ 33: x: 1073 y: 319

Python-: 33 x: 1060.253744983529 y: 328.09794247042737

This is a difference of dx = 13 and dy = 9.

For some images I get a difference as large as 30 pixels.

Five points for your consideration as you improve your tools:

a. I was just reporting my observations that I thought software developers in Openvino team in Intel would be interested to know in their development efforts. What you do with it is your call of course.

b. It matters in a chain of networks in gaze estimation as the inaccuracies propagate and amplify, where the pose estimation can be off by a few degrees.

c. It is not so much the difference but where the difference come from? my understanding is that the backend is the same for both C++ and Python.

d. How do you measure accuracy? Is dx = 13 is accurate enough? How far dx/dv has to deviate before it is called inaccurate?

e. Is there a upper bound on the difference? As I said I have seen difference as high as 30 pixels, before I reported my observations to you. Unfortunately, I cannot member which image I used to send it to you to replicate, but they were grayscale for use in driver-monitoring system. You can try a few different grayscale images that are available to you.

Megat_Intel · ‎03-05-2024

Hi Saeidn95,

Thank you for highlighting your findings to us.

We will convey these information to the appropriate teams for further clarification regarding this issue. We will also test out other images including the grayscale images to see if we observed any huge differences.

Regards,

Megat

Megat_Intel · ‎03-14-2024

Hi Saeidn95,

Thank you for the feedback and findings you provided.

We will continue to improve the OpenVINO™ Toolkit further in future releases and we appreciate your support. Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored.

Regards,

Megat