Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6403 Discussions

The infer result is not stable, somtimes is correct, sometimes become NaN

longfei98
Beginner
1,189 Views

Hello, currently I am optimizing my own model with python openvino. While when testing with large amount of dataset for infering, I find that some data got result with all nan. But if I infer these data again, it becomes correct. It means that the result is not stable. 

Some tests I took:

1. Model A: I firstly got this issue from model A. It is a generic UNet with conv and deconv. And for improving the performance, I pruned some channels with network slimming method.  Then convert pytorch model to IR with FP16. And then I test and find this issue. At first, I though this may be caused by data type. I changed to FP32, same issue happens. Then I though it may be cuased by pruning. So I do next testing. 

2. Model B: this is same model without pruning. And samely, convert to IR with FP16 and do testing. The issue seems to be difficult to be rerpoduced than model A. While after testing large amount of data, it is reproduced again. 

Now I am confused. Could anyone give any suggestion or idea? Thank you very much.

 

 

openvino version: 2020

 

 

0 Kudos
8 Replies
longfei98
Beginner
1,160 Views

Any one can help? Thank you very much.

0 Kudos
Iffa_Intel
Moderator
1,149 Views

Greetings,


Have you tried to cut the dataset to see whether a large dataset caused the issue?

You may refer here:

https://docs.openvinotoolkit.org/2020.1/_docs_Workbench_DG_Download_and_Cut_Datasets.html



Sincerely,

Iffa




0 Kudos
longfei98
Beginner
1,138 Views

Thanks for your reply. 

I try to infer each data one by one. After each prediction for one data is done,  the process will be killed and re-start to predict for next data. So although I test large amount of data, I do not predict at the same time. 

And the dataset I used is from local and not from open-source dataset.

0 Kudos
longfei98
Beginner
1,134 Views

I am wondering whether it is caused by memory usage overflow. 

I use openvino-cpu to do inference. Is it possible to get the nan result if the cpu memory usage is full?

I try to do some test to verify my assumption, but not reproduced yet.

0 Kudos
Iffa_Intel
Moderator
1,127 Views

Generally NaN “Not a Number”  is a type of error that indicate an exception which usually occurs in the cases when an expression results in a number that can't be represented


Can you share your model for me to try it out if possible?

Plus could you clarify which model, topology and openvino demo that you used to infer this?



Sincerely,

Iffa


0 Kudos
longfei98
Beginner
1,106 Views

Sorry for my late reply.

This is a commercial used model so I 'm afraid I can not share it to you.

The backbone I used is UNet structure where is from nnUnet (https://github.com/MIC-DKFZ/nnUNet) for segmentation. there are some self defined layers, I'm not sure whether it causes the reason.

I will try to create the similar model with natural dataset and reproduce it. Then that model can be shared to you.

BTW, is there any one to meet the similar issue like this before? My collegue also meets this issue with different model.  It happens randomly and hard to be reproduced. Really strange.

0 Kudos
Iffa_Intel
Moderator
1,082 Views

Generally,


If you are using the supported models and feed them with the correct inputs, these kinds of issues won't appear.

Let say that your program is expecting to receive some numbers as input but it receives strings instead.

This would definitely produce error.


You may refer here: https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_TensorFlow.html


This documentation contains the types of models and the supported topologies and is officially validated.

This is really important.


Sincerely,

Iffa



0 Kudos
Iffa_Intel
Moderator
1,067 Views

Greetings,


Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.


Sincerely,

Iffa


0 Kudos
Reply