- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for YOLO and Facenet support in R3. Model optimizer runs fine and execution for both FP16 and FP32 is smooth on GPU devices (clDNN).
One issue we are experiencing is with FP32 on CPU device (MKL-DNN plug-in). We get various crashes on both Windows and Linux. Is it a supported configuration?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nikos,
yes, FP32 is supported on CPU. Which crashes do you experience exactly? Can you report them here?
Best,
Severine
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Severine, Thank you for confirming that FP32 CPU inference of YOLO & facenet is supported. It seems I am having some trouble to properly link/use the required intel64/libcpu_extension*so and loading YOLO / Facenet networks fail on both Windows and Ubuntu. When I debug the crash I do not get meaningful information as I do not have symbols. Let me investigate a bit more and update soon.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem is only on CPU path and is related to output/YoloRegion - if I remove it CPU runs fine too.
-d GPU clDNN path works fine. For example:
./object_detection_demo_ssd_async -m tiny-yolo.xml -i test.mp4 -d GPU -pc
performance counts: 0-convolutional EXECUTED layerType: Convolution realTime: 1215 cpu: 3 execType: convolution_gpu_bfyx_os_iyx_osv16 11-maxpool EXECUTED layerType: Pooling realTime: 49 cpu: 2 execType: pooling_gpu_bfyx_block_opt 12-convolutional EXECUTED layerType: Convolution realTime: 1312 cpu: 2 execType: convolution_gpu_bfyx_os_iyx_osv16 14-maxpool EXECUTED layerType: Pooling realTime: 37 cpu: 3 execType: pooling_gpu_bfyx_block_opt 15-convolutional EXECUTED layerType: Convolution realTime: 1299 cpu: 2 execType: convolution_gpu_bfyx_os_iyx_osv16 17-maxpool EXECUTED layerType: Pooling realTime: 34 cpu: 2 execType: pooling_gpu_bfyx_block_opt 18-convolutional EXECUTED layerType: Convolution realTime: 4119 cpu: 2 execType: convolution_gpu_bfyx_os_iyx_osv16 2-maxpool EXECUTED layerType: Pooling realTime: 642 cpu: 2 execType: pooling_gpu_bfyx_block_opt 20-convolutional EXECUTED layerType: Convolution realTime: 8201 cpu: 2 execType: convolution_gpu_bfyx_os_iyx_osv16 22-convolutional EXECUTED layerType: Convolution realTime: 102 cpu: 2 execType: convolution_gpu_bfyx_os_iyx_osv16 3-convolutional EXECUTED layerType: Convolution realTime: 1243 cpu: 2 execType: convolution_gpu_bfyx_os_iyx_osv16 5-maxpool EXECUTED layerType: Pooling realTime: 162 cpu: 2 execType: pooling_gpu_bfyx_block_opt 6-convolutional EXECUTED layerType: Convolution realTime: 1164 cpu: 2 execType: convolution_gpu_bfyx_os_iyx_osv16 8-maxpool EXECUTED layerType: Pooling realTime: 90 cpu: 2 execType: pooling_gpu_bfyx_block_opt 9-convolutional EXECUTED layerType: Convolution realTime: 1145 cpu: 2 execType: convolution_gpu_bfyx_os_iyx_osv16 LeakyReLU_ NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef LeakyReLU_372 NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef LeakyReLU_373 NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef LeakyReLU_374 NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef LeakyReLU_375 NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef LeakyReLU_376 NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef LeakyReLU_377 NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef LeakyReLU_378 NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef input_cldnn_input_preprocess EXECUTED layerType: Reorder realTime: 143 cpu: 6 execType: reorder_data output/YoloRegion NOT_RUN layerType: RegionYolo realTime: 0 cpu: 0 execType: undef output/YoloRegion_cldnn_ou... EXECUTED layerType: Reorder realTime: 152 cpu: 2 execType: region_yolo_gpu_ref Total time: 21109 microseconds [ INFO ] Execution successful
CPU path however (if I keep output/YoloRegion) seems to fail to load. Is YoloRegion supported in MKLDNNPlugin ?
API version ............ 1.2 Build .................. lnx_20180510 Description ....... MKLDNNPlugin [ INFO ] Loading network files [ INFO ] Batch size is forced to 1. [ INFO ] Checking that the inputs are as the sample expects [ INFO ] Checking that the outputs are as the sample expects [ INFO ] Loading model to the plugin [ ERROR ] std::exception
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Nikos,
I could reproduce your issue. As I had to do it, did you commented few lines to be able to run the model through the sample (tell me if not), all the checks on models and output size for example. First, I need to investigate this as it shows that Yolo model is not completely adapted for the sample and might explain the errors we have further with the CPU plugin.
Best,
Severine
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Severine,
Thank you for confirming repro with CPU plug-in. Just to confirm that I had to slightly modify object_detection_demo_ssd_async as you suggested. I am sorry I forgot to mention that. This enabled to load and run tiny YOLO on the GPU device without any issues.
Same code however, fails on CPU. One workaround would be to edit the generated xml and remove "output/YoloRegion" .
<layer id="28" name="output/YoloRegion" precision="FP32" type="RegionYolo"> <data axis="1" classes="20" coords="4" do_softmax="1" end_axis="3" num="3"/> <input> <port id="0"> <dim>1</dim> <dim>30</dim> <dim>26</dim> <dim>26</dim> </port> </input> <output> <port id="1"> <dim>1</dim> <dim>20280</dim> </port> </output> </layer>
and also the edge
<edge from-layer="27" from-port="3" to-layer="28" to-port="0"/>
Then we can run fine on CPU too but need to implement YoloRegion separately.
In a future SDK release it would be nice to have a new python or C++ sample application that demonstrates end-to-end YOLO detection. Something like a new object_detection_demo_yolo_async would be nice.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nikos,
I analyzed tiny-yolo output and I realize it is not adapted for the sample. The sample expects a vector of dimension 4 output while tiny-yolo output is of dimension 2. Compiling the samples in Debug mode makes this issue more apparent as it will crash for both CPU and GPU.
In Release mode, it has unexpected behavior and does not crash even when you call a vector out of its range. This is what we were experimenting in CPU and GPU ( in my case, GPU is working half the time).
The issue is not the model, but the sample that is not adapted and the output reading that needs to be changed.
Best,
Severine
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nikos wrote:
Hi Severine,
Thank you for confirming repro with CPU plug-in. Just to confirm that I had to slightly modify object_detection_demo_ssd_async as you suggested. I am sorry I forgot to mention that. This enabled to load and run tiny YOLO on the GPU device without any issues.
Same code however, fails on CPU. One workaround would be to edit the generated xml and remove "output/YoloRegion" .
<layer id="28" name="output/YoloRegion" precision="FP32" type="RegionYolo"> <data axis="1" classes="20" coords="4" do_softmax="1" end_axis="3" num="3"/> <input> <port id="0"> <dim>1</dim> <dim>30</dim> <dim>26</dim> <dim>26</dim> </port> </input> <output> <port id="1"> <dim>1</dim> <dim>20280</dim> </port> </output> </layer>and also the edge
<edge from-layer="27" from-port="3" to-layer="28" to-port="0"/>Then we can run fine on CPU too but need to implement YoloRegion separately.
In a future SDK release it would be nice to have a new python or C++ sample application that demonstrates end-to-end YOLO detection. Something like a new object_detection_demo_yolo_async would be nice.
Hi Nikos, do you mind sharing the modified cpp code that you modified? I have been trying but I still have a problem reading the output from yolo.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page