My application need to work 24/7 at remote location. I'm looking for ways to recover from stuck issues. Typical cause for stuck are due to temperature and USB power. lilohuang provides good solution for these issue at https://software.intel.com/en-us/forums/computer-vision/topic/805843#comment-1936548. However, I still want an good recovery method build in.
Typical error message when stuck is "[35mE: [ncAPI] [ 0] ncGraphQueueInference:3538 Can't send trigger request".
The message will continue until the program restart.
I'm looking for ways to detect this condition and reset the NSC2 to re-start the process. What is the best way to accomplish this task ?
I tried reset/create the
async_infer_request_next = network.CreateInferRequestPtr();
async_infer_request_curr = network.CreateInferRequestPtr();
I'm not able to get it working. The message continue showing up. Is there anyway to reset the thread the produce the message ?
Environment : NCS2,openvino r5, windows 10, object_detection_demo_ssd_async.
Any thoughts and suggestion would be greatly appreciated.
Dear Terry, please download the latest OpenVino 2019 R1 which was just released today. It has NCS2 improvements. However even in 2019 R1 there is no guarantee that failure recovery is possible.
Thanks for using OpenVino !
Thanks for letting me know about R1. I went through the release note and not able to find the area relate to recovery. Can you be more specific on the approach of method that I can try to do the recovery.
Specifically, is there anyway to stop the task/thread that generate the messages.
I need to find a solution for this before I can release the product to work on remote site.
I have kind of the same problem. I have my intel NCS2 in a remote place. I live in Ecuador where heat can go up the 30°C during mid day. I am having sometimes problems with the NCS2 which displays this message in the console:
E: [watchdog] [ 432134] sendPingMessage:164 Failed send ping message: X_LINK_ERROR
I dont really know if this is due to temperature.
Do you guys have any advances in trying to except this problems?
I am working with a rasperry pi 3, so i manage to make ir reboot everytime the scripts goes down because of high temperature (waiting frist some minutes of course). but the message of the sendPingMesage failed just keep printing without ending the script. de number "432134" was line printed numer. so you can see that this goes for a long time.
Any help will be amazing.