Solved: NCS2 freezes when loading BERT model

alexandru_irimiea · ‎02-25-2020

I used the instructions from here to convert a BERT model to IR: https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_BERT_From_Tensorflow.html

However, when I load it on NCS2, nothing happens; the IECore's load_network(network=net, device_name="MYRIAD") doesn't return. Strangely, the model is loaded on NCS1 (with the same method).

Do you know what might be the problem?

RandallMan_B_Intel · ‎03-17-2020

Hello Alexandru,

Thanks for your patience, please try the following steps to load successfully on the Intel® Neural Compute Stick 2:

Turn off the VPU_HW_STAGES_OPTIMIZATION and re-test.
Add the following after declaring IECore() on the demo file:

if args.device == "MYRIAD":

ie.set_config({'VPU_HW_STAGES_OPTIMIZATION': 'NO'}, "MYRIAD")

Best regards,

Randall B.

View solution in original post

SIRIGIRI_V_Intel · ‎03-02-2020

Hi Alexandru,

It seems that the model has few unsupported layers(ReduceMean, Erf). Please check the supported layers for the MYRIAD. You can try implementing the unsupported layers using custom layers.

Regards,

Ram prasad

alexandru_irimiea · ‎03-03-2020

Hi Ram, I am not sure that is the case, because the model is loaded and executed successfully on NCS1. You can try it yourself.

RandallMan_B_Intel · ‎03-12-2020

Hello Alexandru,

I converted the TensorFlow* BERT Model to the Intermediate Representation successfully.

I used on NCS1 and NCS2 the following command:

~/Downloads$ python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py --input_meta_graph multilingual_L-12_H-768_A-12/bert_model.ckpt.meta --output bert/pooler/dense/Tanh --disable_nhwc_to_nchw --input Placeholder{i32},Placeholder_1{i32},Placeholder_2{i32}

And I get the following results:

Model Optimizer arguments:

Common parameters:

- Path to the Input Model: None

- Path for generated IR: /home/ncs1/Downloads/.

- IR output name: bert_model.ckpt

- Log level: ERROR

- Batch: Not specified, inherited from the model

- Input layers: Placeholder{i32},Placeholder_1{i32},Placeholder_2{i32}

- Output layers: bert/pooler/dense/Tanh

- Input shapes: Not specified, inherited from the model

- Mean values: Not specified

- Scale values: Not specified

- Scale factor: Not specified

- Precision of IR: FP32

- Enable fusing: True

- Enable grouped convolutions fusing: True

- Move mean values to preprocess section: False

- Reverse input channels: False

TensorFlow specific parameters:

- Input model in text protobuf format: False

- Path to model dump for TensorBoard: None

- List of shared libraries with TensorFlow custom layers implementation: None

- Update the configuration file with input/output node names: None

- Use configuration file used to generate the model with Object Detection API: None

- Operations to offload: None

- Patterns to offload: None

- Use the config file: None

Model Optimizer version: 2020.1.0-61-gd349c3ba4a

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.

_np_qint8 = np.dtype([("qint8", np.int8, 1)])

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.

[ SUCCESS ] Generated IR version 10 model.

[ SUCCESS ] XML file: /home/ncs1/Downloads/./bert_model.ckpt.xml

[ SUCCESS ] BIN file: /home/ncs1/Downloads/./bert_model.ckpt.bin

[ SUCCESS ] Total execution time: 45.84 seconds.

[ SUCCESS ] Memory consumed: 3681 MB.

Regards,

Randall B.

RandallMan_B_Intel · ‎03-17-2020

Hello Alexandru,

Thanks for your patience, please try the following steps to load successfully on the Intel® Neural Compute Stick 2:

Turn off the VPU_HW_STAGES_OPTIMIZATION and re-test.
Add the following after declaring IECore() on the demo file:

if args.device == "MYRIAD":

ie.set_config({'VPU_HW_STAGES_OPTIMIZATION': 'NO'}, "MYRIAD")

Best regards,

Randall B.

alexandru_irimiea · ‎03-18-2020

Hello Randall,

Indeed, disabling the VPU_HW_STAGES_OPTIMIZATION works: the model load no longer freezes (it takes a few seconds).

One question: does the problem reproduce in your case? I mean, when not disabling the the optimization option, does the model take forever to load on your NCS2? I am thinking about buying another NCS2 but I want to make sure that my device is defective first, before buying another one to find the same issue.

Thank you!

RandallMan_B_Intel · ‎03-19-2020

Hello Alexandru,

Your Intel Neural Compute Stick 2 is working as expected, we can also see the same behavior. The default value for VPU_HW_STAGES_OPTIMIZATION works for the majority of the models. There is some model like in your case that works best with no VPU_HW_STAGES_OPTIMIZATION.
Hope this answers your question.

Regards,

Randall B.