Unexpected exception happened

Junwei_Y_Intel · ‎05-25-2018

Hi,

I was trying to convert my caffe model with custom layers to IR for use by inference engine. I followed the instructions "Legacy Mode for Caffe Custom Layers" and Python managed to import caffe without error. I also modified CustomLayersMapping.xml to incorporate my custom layers, however after running mo.py with caffemodel and prototxt the following error occurred:

[ ERROR ] ----------------- INTERNAL ERROR ----------------
[ ERROR ] Unexpected exception happened.
[ ERROR ] Please contact Model Optimizer developers and forward the following information:
[ ERROR ] 1
[ ERROR ] Traceback (most recent call last):
File "/opt/intel/computer_vision_sdk_2018.0.234/deployment_tools/model_optimizer/mo/main.py", line 222, in main
    return driver(argv)
File "/opt/intel/computer_vision_sdk_2018.0.234/deployment_tools/model_optimizer/mo/main.py", line 202, in driver
    custom_layers_mapping_path=custom_layers_mapping_path)
File "/opt/intel/computer_vision_sdk_2018.0.234/deployment_tools/model_optimizer/mo/pipeline/caffe.py", line 165, in driver
    convert_scale_shift_to_mul_add(graph)
File "/opt/intel/computer_vision_sdk_2018.0.234/deployment_tools/model_optimizer/mo/middle/passes/conv.py", line 119, in convert_scale_shift_to_mul_add
    scale_node = node.in_node(1)
File "/opt/intel/computer_vision_sdk_2018.0.234/deployment_tools/model_optimizer/mo/graph/graph.py", line 237, in in_node
    return self.in_nodes()[key]
KeyError: 1

[ ERROR ] ---------------- END OF BUG REPORT --------------

The model I tried to convert is ICNet (Available at https://github.com/hszhao/ICNet), and the caffe I cloned is the BVLC caffe. Please let me know if more information is required and any help would be appreciated!

Best regards,

Junwei

Zhen_Z_Intel · ‎05-27-2018

Hi Junwei,

Would you please share your model file and your caffe program, and your work process for registering your custom layer? Have you ever tried to register customer layer with latest new MO, besides Legacy MO. We can help to check, thank you.

Best regards,
Fiona

Junwei_Y_Intel · ‎05-27-2018

Hi Fiona,

Thanks for your reply. I have attached the caffe (contains implementation of custom layer) and the model to convert. The content in CustomLayersMapping.xml file is as follows:

I firstly built the pycaffe with the attached caffe as told in the tutorial, followed by modifying the xml file to include my custom interpolation layer, and lastly executed mo.py with arguments "--input_model=icnet_cityscapes_trainval_90k.caffemodel and --input_proto=icnet_cityscapes_trainval_90k.prototxt".

I haven't tried the latest MO yet as I will have to implement extra functions to define operations involved in the custom layer. I also tried the old MO in CVSDK 2017, it works well but the converted model cannot be loaded by inference engine (checked with gdb).

Please let me know if I can be of further assistance.

Best regards,

Junwei

Alexander_D_Intel1 · ‎05-30-2018

Hi Junwei,

First of all, I highly recommend to switch to the OpenVINO(former CV SDK) 2018.

Secondly, instead of using Caffe* fallback I recommend to use extensibility mechanism in Model Optimizer. However, Model Optimizer provides support for the Interp layer out of the box.

Try the following command:

python3 mo.py --input_model icnet_cityscapes_trainval_90k.caffemodel --input_proto icnet_cityscapes_trainval_90k.prototxt

Could you please try converting it?

Let me know if you face any issues.

Junwei_Y_Intel · ‎05-30-2018

Hi Alexander,

Thanks for your reply.

I just tried running Model Optimizer on OpenVINO and it still throws the same error as it used to do. I doubt if Interp layer from deeplab v2 is supported by the Model Optimizer, as the layer cannot be found in the section Supported Caffe* Layers (in https://software.intel.com/en-us/articles/OpenVINO-Using-Caffe), and the Model Optimizer also failed to infer the output shape of the layer. So the method seems not working, I will attach the error message below for your reference.

[ ERROR ] Shape [ 1 128 -1 -1] is not fully defined for output 0 of "sub24_sum". Use --input_shape with positive integers to override model input shapes.
[ ERROR ] Cannot infer shapes or values for node "sub24_sum".
[ ERROR ] Not all output shapes were inferred or fully defined for node "sub24_sum". For more information please refer to Model Optimizer FAQ, question #40.
[ ERROR ]
[ ERROR ] It can happen due to bug in custom shape infer function <function eltwise_ext.<locals>.<lambda> at 0x7fa3650e8268>.
[ ERROR ] Or because the node inputs have incorrect values/shapes.
[ ERROR ] Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ] Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ] Stopped shape/value propagation at "sub24_sum" node. For more information please refer to Model Optimizer FAQ, question #38.

Thanks again for your help and please correct me if I am wrong.

Alexander_D_Intel1 · ‎05-31-2018

Hi Junwei,

Model Optimizer does support Interp layer. It is easy to find in the <INSTALL_DIR>/depolyment_tools/model_optimizer/extensions/ops/interp.py and <INSTALL_DIR>/depolyment_tools/model_optimizer/extensions/front/caffe/interp_ext.py.

However, I was able to reproduce your problem. It seems that the shape inference of the Interp layer you are talking about is a bit different to the one implemented in Model Optimizer.

In particular, for "sub24_sum" node I notice that it receives two different shapes. The wrong shape indirectly goes from 'conv5_4_interp' layer.

To make it work, I suggest modifying the infer method in <INSTALL_DIR>/depolyment_tools/model_optimizer/extensions/ops/interp.py file.

Change it to be:

@staticmethod
def interp_infer(node):
    ....
    elif node.shrink_factor == 1 and node.zoom_factor != 1:
            zoom_factor = node.zoom_factor
            if zoom_factor < 1:
                log.error('Zoom factor should be positive in node {}'.format(node.id))
                return None
            height_out_ = height_out_ + (height_out_ - 1) * (zoom_factor - 1) # <--- this is line replacing old code
            width_out_ = width_out_ + (width_out_ - 1) * (zoom_factor - 1) # <--- this is line replacing old code

    ....

Also, I noticed that even with this fix, the model conversion fails if fusing is used. To workaround the issue, run the Model Optimizer with the appropriate flag:

python3 mo.py --input_model icnet_cityscapes_trainval_90k.caffemodel --input_proto icnet_cityscapes_trainval_90k.prototxt --disable_fusing

Meanwhile, I will address the problem with fusing.

By the way, are you going to implement Interp extension for Inference Engine?

Thank you in advance.

Junwei_Y_Intel · ‎05-31-2018

Dear Alexander,

Thanks for your help, it works!

We managed to convert the model, and successfully executed the segmentation sample in Inference Engine with converted model on CPU. We noticed that both CPU and GPU kernel extensions for Interp layer can be found in <INSTALL_DIR>/deployment_tools/inference_engine/samples/extension and <INSTALL_DIR>/depolyment_tools/inference_engine/lib/ubuntu_16.04/intel64/cldnn_global_custom_kernels respectively. So I am not sure if these files are the extension you mentioned, if not could you please elaborate? We do have the plan to implement the OpenCL kernel.

By the way, when I tried to run the segmentation sample on GPU, it throws segmentation fault. When I cgb into the program and print the backtrace, the message is shown below. My understanding is that, the kernel is supposed to be loaded after inference starts, but the program seems to end during loading model to the plugin. Thus I guess the issue could be related to the model itself rather than Interp layer, moreover the same error also happened with the model converted using CVSDK 2017.

If possible, could you please also have a look and let me know if my understanding is correct? I am still trying to solve this issue, your help will be highly appreciated.

Best regards

#0 0x00007ffff24417ca in CLDNNPlugin::CLDNNGraph::CreateScaleShiftPrimitive(std::shared_ptr<InferenceEngine::CNNLayer>&) () from /opt/intel/computer_vision
_sdk_2018.0.234/deployment_tools/inference_engine/lib/ubuntu_16.04/intel64/libclDNNPlugin.so
#1 0x00007ffff2453e0b in CLDNNPlugin::CLDNNGraph::CreateSingleLayerPrimitive(std::shared_ptr<InferenceEngine::CNNLayer>&) () from /opt/intel/computer_visio
n_sdk_2018.0.234/deployment_tools/inference_engine/lib/ubuntu_16.04/intel64/libclDNNPlugin.so
#2 0x00007ffff245593b in CLDNNPlugin::CLDNNGraph::Load(InferenceEngine::ICNNNetwork&) () from /opt/intel/computer_vision_sdk_2018.0.234/deployment_tools/in
ference_engine/lib/ubuntu_16.04/intel64/libclDNNPlugin.so
#3 0x00007ffff2457951 in CLDNNPlugin::CLDNNGraph::CLDNNGraph(InferenceEngine::ICNNNetwork&, CLDNNPlugin::CLDNNGraph::Config const&) () from /opt/intel/comp
uter_vision_sdk_2018.0.234/deployment_tools/inference_engine/lib/ubuntu_16.04/intel64/libclDNNPlugin.so
#4 0x00007ffff242beab in CLDNNPlugin::clDNNEngine::LoadExeNetworkImpl(InferenceEngine::ICNNNetwork&, std::map<std::__cxx11::basic_string<char, std::char_tr
aits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<ch
ar, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) () from /opt/intel/computer_vision_sdk_2018.0.234/de
ployment_tools/inference_engine/lib/ubuntu_16.04/intel64/libclDNNPlugin.so
#5 0x00007ffff2431898 in InferenceEngine::InferencePluginInternal::LoadNetwork(std::shared_ptr<InferenceEngine::IExecutableNetwork>&, InferenceEngine::ICNN
Network&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11:
:basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >
> > const&) () from /opt/intel/computer_vision_sdk_2018.0.234/deployment_tools/inference_engine/lib/ubuntu_16.04/intel64/libclDNNPlugin.so
#6 0x00007ffff242db75 in InferenceEngine::PluginBase<CLDNNPlugin::clDNNEngine>::LoadNetwork(std::shared_ptr<InferenceEngine::IExecutableNetwork>&, Inferenc
eEngine::ICNNNetwork&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_
traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocat
or<char> > > > > const&, InferenceEngine::ResponseDesc*) () from /opt/intel/computer_vision_sdk_2018.0.234/deployment_tools/inference_engine/lib/ubuntu_16.0
4/intel64/libclDNNPlugin.so
#7 0x00000000004708e5 in InferenceEngine::InferencePlugin::LoadNetwork (this=0x7fffffffd2c0, network=..., config=std::map with 0 elements) at /opt/intel/co
mputer_vision_sdk_2018.0.234/deployment_tools/inference_engine/include/cpp/ie_plugin_cpp.hpp:84
#8 0x000000000046bf2d in main (argc=1, argv=0x7fffffffdb38) at /opt/intel/computer_vision_sdk_2018.0.234/deployment_tools/inference_engine/samples/segmenta
tion_sample/main.cpp:214

Gleb_K_Intel · ‎06-01-2018

Hi Junwei,

I have found that your model contains suspicious Scale layer at the beginning (data_sub1) and I suppose that is why you have got an error on GPU. This Scale layer does not contain any scale or shift value (looks like this layer just pass throw input tensor and do nothing with it), when Model Optimizer expects at least scale value, so that's why you will get an error without --disable_fusing key (for MO).

The reason why your IR works on CPU and doesn't work GPU plugin can be explained as this plugins have different behaviour for such exceptional cases. As I can see from logs that GPU plugin crashed on ScaleShift primitive creation.

So, just try to remove this Scale layer from your model.

Junwei_Y_Intel · ‎06-05-2018

Hi Gleb,

Thanks for your helpful explanation. The program works after removing the Scale layer, but the generated image is quite messy. We carefully compared implementations of Interp layer in Caffe and OpenVINO, and suspect there might be some issues in the kernel code. We are still investigating the kernel code and hopefully we can work out soon.

Thanks again for your generous help.