Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

Inference errors using NCS2



I am currently using two NCS2 devices with OpenVino 2020.2 in production. The two MyriadX devices are in a miniPCIe card. The Computer has an Intel Atom E3845 CPU and runs Windows 10.

The solution runs sequentially several instances of SSD Mobilenetv2 networks from two concurrent threads (each thread performs inference of four models, one after the other, everything loaded in memory at start-up, so 8 models loaded in memory) in near real-time (about 8 fps). It works wonderfully for hours (some time even days) non-stop, althought the CPU is working most of the time at between 80%-100%. 

The device's temperature never exceeds 45ºC (we are monitoring it), af if it does, we pause inference until it cools down. When the CPU usage is over 99% we also skip running the inference.

The problem we have is that the software eventually crashes (rarely after more then 24 hours),  giving the exception : "Failed to queue inference: NC_ERROR".

In the Windows Event Viewer the execption seems to happen in myriadPlugin.dll.

I can't be completely certain but it looks like the issue is related to sustained high CPU usage.

Any idea or suggestion as to how to fix this problem? The software needs to work 24/7 and this "random" crashes are killing us. Please help!

Best regards,



0 Kudos
2 Replies



one of the workaround for your case (especially the NC error) is by cleaning up the NCS2 memory.

You may remove the exec_net function (del exec_net), once you are done with the stick and this will free up the stick automatically.


You may refer to this Inference Engine Python API Reference.


Another thing to consider is to modify your software design to facilitate your hardware. I believe you are well known that Atom is an older version of processor that has limitations to its computing power/memory. One of the ways to optimize its performance is by creating a software design that does not burden the CPU too much.


Normally, a program flow would be synchronous, where it would execute an inference request and wait until the whole operation finishes. This would definitely take a lot of time and kills performance.

Instead, send an inference request and straight away prepare for the next frame inference request or any other tasks without waiting for it to finish.


Now we can have the preparation of frames and the inference operation in parallel since the inference request is not blocking the execution. This is known as Asynchronous Execution and would give huge throughput improvement.

You may refer to the OpenVINO Asynchronous Inference Request.





0 Kudos


Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question. 



0 Kudos