Community
cancel
Showing results for 
Search instead for 
Did you mean: 
xie__yuan
Beginner
184 Views

Suddenly cannot receive any result from NCS2 in Async mode

Hi, I'm trying to build a whole face recognition processing using NCS2. I choose Mobilenet-ssd for detection and sphereface for recognition. I found that I cannot load more than one NCS2 or load more than two NCS1 with the IEPlugin api (I thought one plugin is an abstract for one device,  I don't know if I was right.). I have to bind ssd net and sphereface net to one NCS2, and I'm experimenting on the async api provided by OpenVINO. For each net, I set num_requests to 2.  The brief code for processing is just like below:

While True:

        check if there's free space for an inferrequest in the NCS, if there is

                load a inferrequest into the NCS with StartAsync API

        check the status of inferrequests loaded before with Wait API, if the status == 0:

                processing...

 

The code actually works. But if it runs for a long time, such like 2 hours, the status return from Wait API will never be 0 again, and it will be -9 forever, which means the result is not ready according to the doc-https://docs.openvinotoolkit.org/R5/ie__common_8h.html#a2ce897aa6a353c071958fe379f5d6421.

I'm running on Ubuntu16.04 and my CPU is i7-7700k. I find that if I set num_requests to 4, the problem will show much earlier, such like 2 minutes, and if I set num_requests to 1, just the same as sync mode, the problem won't show up.

Has anyone experienced it before? Could anyone give some clues for this problem? Is it because I load two different nets into one NCS2 and let them run at the same time, or it's just the heat dissipation problem, or the problem in async mode?

0 Kudos
18 Replies
lilohuang
New Contributor I
184 Views

I can reproduce the similar hanging issue, and even with the sync mode, I can reproduce the bug with 12 hours long running test. I got hanging issue on wait() API and then I saw this error https://software.intel.com/en-us/forums/computer-vision/topic/805843 Hopefully, it's not an overheating issue. Heavy sigh.

 

Hyodo__Katsuya
Innovator
184 Views

I have known that phenomenon quite a long time ago.

Performance and sustainable time are tradeoffs.

PINTO = Hyodo, Katsuya.

https://ncsforum.movidius.com/discussion/comment/3631/#Comment_3631

https://github.com/PINTO0309/OpenVINO-EmotionRecognition/issues/1

lilohuang
New Contributor I
184 Views

Yes, PINTO, that's why I bought 2 more NCS2 devices (now I have 3 NCS2 devices) to mitigate the issue and even for failover, but I'm worried there is a BUG on USB driver or somewhere rather than overheating. It would be great if OpenVINO can provide APIs for detecting the NCS2 hardware status including temperature and utilization! I also believe NCS2 is NOT a prototype, and it can be used on 24x7 workload without active cooling system (i.e. CCTV). I even CANNOT feel high temperature with my fingers on the NCS2 device when the error occurred. We need some feedback from INTEL experts. Thank you.

Hyodo__Katsuya
Innovator
184 Views

>It would be great if OpenVINO can provide APIs for detecting the NCS2 hardware status including temperature and utilization! I also believe NCS2 is NOT a prototype

I have the same idea as you.

I'm worried that there is still a problem with the OpenVINO API.

xie__yuan
Beginner
184 Views

Thank you both! Actually I'm the one who opened the issue in PINTO's repo, hhh.

Not only the heat problem, I also got an eletric sock when I wanted to feel the temperature of NCS2, and the NCS2 stopped to work.. I think ESD also has problem too.

@lilohuang, I got one more question, does it work when you plug 3 NCS2 to your UP BOARD? Can you run it 24x7 now? I'm using Raspberry Pi3B+, and I found that it showed heat dissipation problem eariler than NCS2.. sigh..

@Hyodo, Katsuya, in https://github.com/PINTO0309/OpenVINO-EmotionRecognition/issues/1 you said NCSDK is more suitable for long-term driving, I found the answer you post in NCSDK forum, do you mean NCSDK provide an api to get the device temperature, and you can monitor it to decide when to sleep the processing thread?

xie__yuan
Beginner
184 Views

Thank you both! Actually I'm the one who opened the issue in PINTO's repo, hhh.

Not only the heat problem, I also got an eletric sock when I wanted to feel the temperature of NCS2, and the NCS2 stopped to work.. I think ESD also has problem too.

@lilohuang, I got one more question, does it work when you plug 3 NCS2 to your UP BOARD? Can you run it 24x7 now? I'm using Raspberry Pi3B+, and I found that it showed heat dissipation problem eariler than NCS2.. sigh..

@Hyodo, Katsuya, in https://github.com/PINTO0309/OpenVINO-EmotionRecognition/issues/1 you said NCSDK is more suitable for long-term driving, I found the answer you post in NCSDK forum, do you mean NCSDK provide an api to get the device temperature, and you can monitor it to decide when to sleep the processing thread?

Hyodo__Katsuya
Innovator
184 Views

@xie, yuan

First, let's look at an overview of the algorithm I thought.

https://github.com/PINTO0309/MobileNet-SSD-RealSense#simple-clustering-function-multistick--multiclu...

Next, look at the implementation.

I implemented a function to measure the internal temperature of NCS and to automatically stop inference when a certain temperature is reached, and an algorithm to automatically pause at certain intervals.

Unfortunately, it only works with NCSDK.

Also, NCSDK does not support NCS2.

https://github.com/PINTO0309/MobileNet-SSD-RealSense/blob/master/MultiStickSSDwithRealSense.py

 

lilohuang
New Contributor I
184 Views

@xie, yuan So far, I didn't get a chance to do the long running test with 3 NCS2 devices. I will update the status next week if possible. Thanks.

lilohuang
New Contributor I
184 Views

@Sahira_at_Intel provided feedback on my post https://ncsforum.movidius.com/discussion/1619/watchdog-0-sendpingmessage-164-failed-send-ping-message-x-link-error#latest FYR. 

lilohuang
New Contributor I
184 Views

@xie, yuan Today, I got a chance to run two NCS 2 devices using my UP-BOARD computer. Two NCS2 devices were connected with additional AC power USB hub to the UP-BOARD computer USB3 port. I tried to switch the active NCS2 device every 10 secs, so the inactive NCS2 device was able to take a rest while the active NCS2 working. It worked well at the beginning, but sadly the same error occurred after long running test (> 1 hr)...

e.g.

yolo_active     = YoloObjectDetector()
yolo_inactive   = YoloObjectDetector()
cap = cv2.VideoCapture()
cap.set(cv2.CAP_PROP_FPS, 30)
processed_frames = 0

while cap.isOpened():
   # inferencing and rendering code.  ... omitted
   processed_frames +=1 
   if processed_frames == 300:
      processed_frames = 0
      yolo_active, yolo_inactive = yolo_inactive, yolo_active
xie__yuan
Beginner
184 Views

@lilohuang thank you for the update! 

as far as I know, OpenVINO will automatically allocate tasks to all NCS2 plugged, I guess the code will not work as you design, and I think maybe both the NCS2 will be running simultaneously. By the way, are you working on sync mode or async mode? If it's async mode, what num_requests do you set? My code can run for a whole night if I set the num_request to 1 and plug two NCS2, but it's running on ubuntu16.04 laptop.

lilohuang
New Contributor I
184 Views

No. I noticed OpenVINO won't automatically allocate tasks to all NCS2 plugged. If you plugged two NCS2 devices, you have to load the plugin and network twice. My YoloObjectDetector may run the inference through a worker thread or worker process depending on configuration. The worker process mode is used by default to prevent Python GIL lock contention. I tested it with async mode (num_request = 2) today, the error is repeatable.

You can also refer to Yuanyuan L. (Intel) https://github.com/yuanyuanli85/open_model_zoo/blob/ncs2/demos/python_demos/multiple_device_ncs2_asy... and PINTO's https://github.com/PINTO0309/OpenVINO-YoloV3/blob/master/openvino_tiny-yolov3_MultiStick_test.py  They all instantiated IEPlugin and IENetwork multiple times based the number of NCS2 devices. 

xie__yuan
Beginner
184 Views

I‘m also confused about that, because the model should be assigned manually. But I also found that if I plug 2 NCS2 and just assign one model to  the IEPlugin, both NCS2 will heat up, and if I unplug any one of them, the program will fail. Doesn't it mean all the NCS2 are working?

lilohuang
New Contributor I
184 Views

@xie, yuan To be honest, I don't know how it works under the hood, Perhaps, @Yuanyuan L. (Intel) can describe details. From my observation, the USB device name will be changed from "Movidius Myriad X" to "VSC Loopback Device" when a NCS2 device is working. I only can see one of my two NCS2 devices changing from "Movidius Myriad X" to "VSC Loopback Device" if I instantiated IEPlugin and IENetwork once.

On the other hand, I can see multiple "VSC Loopback Device" appeared if I instantiated IEPlugin and IENetwork multiple times based on the number of NCS2 devices like @Yuanyuan L. (Intel) and @PINTO did.

ncs2.jpg

xie__yuan
Beginner
184 Views

That's great evidence! Thank you for the info!

Maybe the best solution for now is to use NCS1 with NCSDK to monitor the device temperature.

lilohuang
New Contributor I
184 Views

@xie, yuan @PINTO My issue seems to be resolved after I bought a brand new 7 ports AC powered USB hub ($40 USD) instead of using my previous 4 ports AC powered USB hub ($30 USD). The brand new 7 ports AC powered USB hub has a LARGER AC adapter comparing to the 4 ports AC powered USB hub. I’m not 100% sure, but the X_LINK_ERROR seems to be related to the USB controller chip compatibility or USB power system stability. It's still running after 12 hours torture testing w/o any error. Thanks.

FYR, https://ncsforum.movidius.com/discussion/1619/watchdog-0-sendpingmessage-164-failed-send-ping-messag...

xie__yuan
Beginner
184 Views

That's good news! I just got the same problem after I changed a USB hub. The hub cannot support more than one NCS2.

lilohuang
New Contributor I
184 Views

Now I'm using TP-LINK UH720 USB hub connecting with two NCS2 devices simultaneously. No error occurred after running 18 hours heavy loading inferencing. Just FYR. Thanks!

Reply