Error Handling InferencePlugin

Sergeda__Paulius · ‎04-16-2019

Hello,

I am making an application using modified sample code. Currently working on this case: Somebody pulls out the NCS2 after the networks have been loaded, during or in between inference calls. My goal would be for the program to simply self-terminate, or at least throw some sort of exception. In some cases, the program does exactly that, but most of the time it just prints out this text:

E: [xLink] [ 771396] dispatcherEventSend:889 Write failed header -4 | event USB_WRITE_REQ

E: [xLink] [ 771396] dispatcherEventReceive:308 dispatcherEventReceive() Read failed -4 | event 0x7fffc14bba40

E: [xLink] [ 771396] eventReader:256 eventReader stopped

After that, what I assume is on every detector call, the program prints and continues:

E: [watchdog] [ 482694] sendPingMessage:164 Failed send ping message: X_LINK_ERROR.

To get back to my question, is there any way to reload the plugin, or at least get an exception if this case happens? Any information which is related to this case would be very appreciated.

EDIT: "After that, what I assume is on every detector call, the program prints and continues:" - this is wrong. it seems that the program actually halts, and the messages are most likely coming from libinference_engine.so. My main issue now is these messages don't show up if I read stdout or stderr from a Python subprocess shell, while every other message does.

Shubha_R_Intel · ‎04-17-2019

Dear Sergeda, Paulius,

I hope you understand that NCS2 is not guaranteed to recover from failure due to pulling it out in the middle of inference as you describe.

Thanks,

Shubha

Sergeda__Paulius · ‎04-18-2019

Yes, I understand. Reading it again, my question is not very clear - sorry about that!

I don't want to recover from a failure at all. In fact, I would be more than happy if the library crashed, or threw an IO/Stream/Pipe/Something exception instead of getting stuck in a loop trying to ping the NCS2. Why would it keep running if there obviously is no way to recover?

But if that kind of robustness is just not feasible, is there anything I could do to stop the library printing these error messages to stdout? Not only does that seem like bad practice in general, but in my case, random error messages showing up in console is really detrimental.

There are solutions to both issues, but they seem to be reaaaally sketchy at first glance, unless I am missing something completely obvious.

To speculate from some past threads, you definitely aren't making libinference_engine.so open-source :( In that case, maybe, make some sort of config for it? Or maybe just print the error messages to stderr instead of stdout?

Thank you for taking the time to answer :) If you have ANY additional information or anything related to my woes, please share it with me!

Kenneth_C_Intel · ‎04-22-2019

Hi, I tried to reproduce this and my application errors out in ~1 second. It would either timeout or it would instantly die when i reconnected the NCS2 device. Could you help me recreate the error that you are experiencing? I fear it may be more of a bug than an implementation issue.

Regards,

Ken

Sergeda__Paulius · ‎04-23-2019

Well, it's embarrassing, but after looking at the issue with a fresh mind after the holidays, it seems as though I was STILL wrong :D The library does NOT in fact print to stdout, I was simply receiving the error messages as a string when fetching results from the detectors after pulling out the stick. The application still doesn't crash completely though, just keeps sending these messages on result fetch calls. Would there still be a need to try recreate the issue? And thank you very much for responding!

Shubha_R_Intel · ‎04-24-2019

Dear Sergeda, Paulius,

join the club ! Embarrassment seems to be a default state when debugging. I feel it often !

And in response to your question:

The application still doesn't crash completely though, just keeps sending these messages on result fetch calls. Would there still be a need to try recreate the issue?

I don't think you need to recreate the issue - the behavior you describe is normal and expected.

Shubha