Handling NCS2 USB connection error in python script

hmc_roboticslab · ‎05-13-2022

Hello,

We are trying handle NCS2 USB connection error case in our python code.

Our openvino version is 2021.4.2 and using inference_engine python API.

We load and serves model in following process

1. Load IR file using IENetwork

2. Load IENetwork on device (MYRIAD) using IECore.load_network function

3. Infer through start_async function

What we are trying to handle is the case of NCS2 USB connection failure.

Since our working environment is on dynamically moving robot, USB connection is unstable.

So, when it fails, we expect the model to be loaded in other idle NCS2 device.

(e.g., 3 NCS2 are connected and 2 models to be served. 1 NCS2 device is idle)

We used timeout interrupt on function "start_async()" to determine whether NCS is alive or not.

If timeout interrupt (lets say 5 seconds no response) occurs, all the openvino related

instances (e.g., IENetwork) are deconstructed. Then re-create serving model using aforementioned process 1, 2, 3.

It works. newly loaded network works OK.

However the background process (thread) which is unknown is keep producing error messages like :

E: [xLink] [ 835528] [Scheduler00Thr] sendEvents:1132 Event sending failed
E: [global] [ 836527] [Scheduler00Thr] dispatcherEventSend:54 Write failed (header) (err -4) | event XL
INK_WRITE_REQ

These messages fill the terminal and barely disappears.

My question is

1. Which thread or process produces these error messages?

Doesn't it matter that this thread or process keep working in background?

2. How to kill this background thread or process. I cannot find any python api to handle this.

3. Is our method (timeout interrupt + re-loading from scratch) best to handle NCS2 USB connection error?

If not, is there any API to get NCS2 connection status?

Thanks a lot,

Iffa_Intel · ‎05-15-2022

Greetings,

Looking at your use case, it is best to use the Official OpenVINO Multi-Device Plugin. This plugin automatically assigns inference requests to available computational devices to execute the requests in parallel. Once one device with higher priority fails or missing, it would revert to another.

You may refer here.

hmc_roboticslab · ‎05-15-2022

@Iffa_Intel

Thanks for your kind response!

Its very close to the method what we were finding!

We adopted multi-device plugin into our serving code and came up with a few additional questions.

1. Does the plugin work as a master and slave structure?

Lets say we connected 2 neural compute sticks.

We assigned these 2 devices into a executable network using multi-device plugin.

If device numbered 2 is disconnected, the model serving still works. (I guess its because of another device, numbered 1)

However when device numbered 1 is disconnected, it always fail and raise runtime error though device numbered 2 is still connected.

Below is the error raised in this situation.

Traceback (most recent call last):
File "/ssd/hq_patrol_vision/src/emergency_action/src/emergency_action_node.py", line 126, in main
if stat.action_on and action_model.is_get_results():
File "/ssd/hq_patrol_vision/src/emergency_action/src/action/action_wrapper.py", line 66, in is_get_results
return self.model.is_get_results()
File "/ssd/hq_patrol_vision/src/emergency_action/src/action/openvino/ncs_model.py", line 161, in is_get_results
if self.exec_net.requests[req].wait(0) == 0:
File "ie_api.pyx", line 1243, in openvino.inference_engine.ie_api.InferRequest.wait
File "ie_api.pyx", line 1268, in openvino.inference_engine.ie_api.InferRequest.wait
RuntimeError: [ GENERAL_ERROR ]

Is this because that device numbered 1 is a master node?

2. As I have mentioned in the original questions, when neural compute stick is disconnected during serving,

following error messages are continuously created and do not disappear.

E: [xLink] [ 215984] [Scheduler00Thr] sendEvents:998 Event sending failed
E: [global] [ 216984] [Scheduler00Thr] dispatcherEventSend:53 Write failed (header) (err -4) | event XLINK_WRITE_REQ

It also happens in multi-device plugin (say, device 1 and 2 are assigned and, device 2 is disconnected during serving).

Is it safe though the threads or processes which produce following messages remain in background?

Thanks!

Iffa_Intel · ‎05-16-2022

Could you provide us with some information regarding your Multi-Device Plugin configuration? (eg: device priority assignment)

There are 3 ways to specify the device target for Multi-Device Plugin as mentioned here:

1. Pass a Prioritized List as a Parameter in ie.load_network()

2. Pass a List as a Parameter, and Dynamically Change Priorities during Execution Notice that the priorities of the devices can be changed in real time for the executable network

3. Use Explicit Hints for Controlling Request Numbers Executed by Devices

As of the warning, you could ignore them as long as there's no error produced.

Sincerely,

Iffa

hmc_roboticslab · ‎05-23-2022

Hi Iffa,

Sorry of late reply.

Our implementation was like below,

all_devices = MULTI:MYRIAD.1.2-ma2480,MYRIAD.1.4-ma2480
exec_net = ie.load_network(network=net, device_name=all_devices)

This is the 1st way in your answer.

From your answer, we suspect the priority as a reason of inference failure when first device (MYRIAD.1.2-ma2480) USB connection fails.

In our case, the inference still works when MYRIAD.1.4-ma2480 device is detached (although it produces warning messages).

However, when MYRIAD.1.2-ma2480 (first device) is detached, the inference permanently fails though second device is connected well.

Should I blame a priority of the first device?

I also wonder whether this be prevented by dynamically changing priorities (2nd way in your answer).

Thanks for answering!

Iffa_Intel · ‎05-24-2022

It is recommended to try out option number 2 in your use case.

You may refer to the official documentation that I shared previously for the instructions.

Sincerely,

Iffa

Iffa_Intel · ‎05-31-2022

Greetings,

Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.

Sincerely,

Iffa

Handling NCS2 USB connection error in python script

Inference Engine