Model optimizer fails to convert the DistilBERT model

alexandru_irimiea · ‎02-27-2020

I stored the DistilBERT as TF saved_model with the following code (a variation of this one: https://github.com/huggingface/tflite-android-transformers/blob/master/models_generation/distilbert.py ) :

import tensorflow as tf
from transformers import TFDistilBertForQuestionAnswering

model = TFDistilBertForQuestionAnswering.from_pretrained('distilbert-base-cased-distilled-squad')

input_spec = tf.TensorSpec([1, 384], tf.int32)
model._set_inputs(input_spec, training=False)

model.save('distilbert-base-cased-distilled-squad', save_format='tf')

I want to convert the model to IR to run it on NCS2, but I receive some strange error:

C:\Users\alexa\Documents\BERT\distilbert-base-cased-distilled-squad
(py37ov2020) python "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo_tf.py" --saved_model_dir ./ --data_type FP16
Model Optimizer arguments:
Common parameters:
        - Path to the Input Model:      None
        - Path for generated IR:        C:\Users\alexa\Documents\BERT\distilbert-base-cased-distilled-squad\.
        - IR output name:       saved_model
        - Log level:    ERROR
        - Batch:        Not specified, inherited from the model
        - Input layers:         Not specified, inherited from the model
        - Output layers:        Not specified, inherited from the model
        - Input shapes:         Not specified, inherited from the model
        - Mean values:  Not specified
        - Scale values:         Not specified
        - Scale factor:         Not specified
        - Precision of IR:      FP16
        - Enable fusing:        True
        - Enable grouped convolutions fusing:   True
        - Move mean values to preprocess section:       False
        - Reverse input channels:       False
TensorFlow specific parameters:
        - Input model in text protobuf format:  False
        - Path to model dump for TensorBoard:   None
        - List of shared libraries with TensorFlow custom layers implementation:        None
        - Update the configuration file with input/output node names:   None
        - Use configuration file used to generate the model with Object Detection API:  None
        - Operations to offload:        None
        - Patterns to offload:  None
        - Use the config file:  None
Model Optimizer version:        2020.1.0-61-gd349c3ba4a
[ ERROR ]  Unexpected exception happened during extracting attributes for node tf_distil_bert_for_question_answering/distilbert/transformer/layer_._5/output_layer_norm/beta/Read/ReadVariableOp.
Original exception message: 'ascii' codec can't decode byte 0x8f in position 1: ordinal not in range(128)

More details obtained with --log_level DEBUG (didn't put the entire output as it's very long):

[ ERROR ]  Unexpected exception happened during extracting attributes for node tf_distil_bert_for_question_answering/distilbert/transformer/layer_._5/output_layer_norm/beta/Read/ReadVariableOp.
Original exception message: 'ascii' codec can't decode byte 0x8f in position 1: ordinal not in range(128)
[ 2020-02-27 14:55:16,711 ] [ DEBUG ] [ main:324 ]  Traceback (most recent call last):
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo\front\extractor.py", line 749, in extract_node_attrs
    supported, new_attrs = extractor(Node(graph, node))
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo\pipeline\tf.py", line 104, in <lambda>
    extract_node_attrs(graph, lambda node: tf_op_extractor(node, check_for_duplicates(tf_op_extractors)))
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo\front\tf\extractor.py", line 92, in tf_op_extractor
    attrs = tf_op_extractors[op](node)
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo\front\common\register_custom_ops.py", line 96, in <lambda>
    node, cls, disable_omitting_optional, enable_flattening_optional_params),
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo\front\common\register_custom_ops.py", line 29, in extension_extractor
    supported = ex.extract(node)
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\extensions\front\tf\const_ext.py", line 32, in extract
    'value': tf_tensor_content(pb_tensor.dtype, shape, pb_tensor),
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo\front\tf\extractors\utils.py", line 76, in tf_tensor_content
    dtype=type_helper[0]),
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8f in position 1: ordinal not in range(128)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo\main.py", line 314, in main
    return driver(argv)
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo\main.py", line 281, in driver
    ret_res = emit_ir(prepare_ir(argv), argv)
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo\main.py", line 226, in prepare_ir
    graph = mo_tf.driver(argv)
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo\pipeline\tf.py", line 104, in driver
    extract_node_attrs(graph, lambda node: tf_op_extractor(node, check_for_duplicates(tf_op_extractors)))
  File "c:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo\front\extractor.py", line 757, in extract_node_attrs
    ) from e
mo.utils.error.Error: Unexpected exception happened during extracting attributes for node tf_distil_bert_for_question_answering/distilbert/transformer/layer_._5/output_layer_norm/beta/Read/ReadVariableOp.
Original exception message: 'ascii' codec can't decode byte 0x8f in position 1: ordinal not in range(128)

Do you know what might be the problem?

(I also tried with the normal BERT as suggested here https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_BERT_From_Tensorflow.html and it converts successfully)

SuryaPSC_Intel · ‎03-02-2020

Hi Alexandru,

Can you mention the command you used to freeze the model?

Have you used –input_type image_tensor parameter while freezing the model?

You may also refer this article to freeze tensorflow model.

Feel free to ask any other question.

Best Regards,

Surya

alexandru_irimiea · ‎03-02-2020

Hi Chauhan, I used the "--saved_model_dir" parameter (as you can see in the command I executed), which accepts a TF 2.0 saved_model directory, if I am not mistaken, as opposed to using frozen model. You can try it yoursel (I used OpenVINO 2020.1.033 and Tensorflow 1.15.2).

I also tried the recommendation from here https://software.intel.com/en-us/forums/intel-distribution-of-openvino-toolkit/topic/842677 to freeze a TF 2.0 to TF 1.14 using freeze_graph.py but I encounter other issues, like "[ ERROR ] Cannot infer shapes or values for node "StatefulPartitionedCall" (you can see the output at the end of this post).

Do you have any recommendation about an alternative method?

The BERT are NLP models, in my case, this DistilBERT is Question-Answer based and accepts a [1,384] tensor (so it's not an image tensor). I am not sure the article you referenced can help me - there is no export_inference_graph.py. In my case I want to simply convert a pre-trained NLP model, which is not related to image classification.

C:\Users\alexa\Documents\BERT
(py3.7_tf1.15.2) λ C:\Users\alexa\PythonEnvironments\py3.7_tf1.15.2\Scripts\saved_model_cli.exe show --dir ./distilbert-base-cased-distilled-squad --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_1'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 384)
        name: serving_default_input_1:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 384)
        name: StatefulPartitionedCall:0
    outputs['output_2'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 384)
        name: StatefulPartitionedCall:1
  Method name is: tensorflow/serving/predict
WARNING:tensorflow:From c:\users\alexa\pythonenvironments\py3.7_tf1.15.2\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1781: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

Defined Functions:
  Function Name: '__call__'
    Option #1
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(?, 384), dtype=tf.int32, name='inputs')
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #2
      Callable with:
        Argument #1
          input_1: TensorSpec(shape=(?, 384), dtype=tf.int32, name='input_1')
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #3
      Callable with:
        Argument #1
          input_1: TensorSpec(shape=(?, 384), dtype=tf.int32, name='input_1')
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #4
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(?, 384), dtype=tf.int32, name='inputs')
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']

  Function Name: '_default_save_signature'
    Option #1
      Callable with:
        Argument #1
          input_1: TensorSpec(shape=(?, 384), dtype=tf.int32, name='input_1')

  Function Name: 'call_and_return_all_conditional_losses'
    Option #1
      Callable with:
        Argument #1
          input_1: TensorSpec(shape=(?, 384), dtype=tf.int32, name='input_1')
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #2
      Callable with:
        Argument #1
          input_1: TensorSpec(shape=(?, 384), dtype=tf.int32, name='input_1')
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #3
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(?, 384), dtype=tf.int32, name='inputs')
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #4
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(?, 384), dtype=tf.int32, name='inputs')
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']

C:\Users\alexa\Documents\BERT
(py3.7_tf1.15.2) λ python "C:\Users\alexa\PythonEnvironments\py3.7_tf1.15.2\Lib\site-packages\tensorflow_core\python\tools\freeze_graph.py" --input_saved_model_dir ./distilbert-base-cased-distilled-squad --output_node_names=StatefulPartitionedCall --output_graph ./frozen-distilbert-base-cased-distilled-squad.pb
2020-03-02 15:33:53.584093: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
WARNING:tensorflow:From C:\Users\alexa\PythonEnvironments\py3.7_tf1.15.2\Lib\site-packages\tensorflow_core\python\tools\freeze_graph.py:161: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
W0302 15:33:53.603973 10368 deprecation.py:323] From C:\Users\alexa\PythonEnvironments\py3.7_tf1.15.2\Lib\site-packages\tensorflow_core\python\tools\freeze_graph.py:161: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
INFO:tensorflow:Restoring parameters from ./distilbert-base-cased-distilled-squad\variables\variables I0302 15:33:54.415533 10368 saver.py:1284] Restoring parameters from ./distilbert-base-cased-distilled-squad\variables\variables
WARNING:tensorflow:From C:\Users\alexa\PythonEnvironments\py3.7_tf1.15.2\Lib\site-packages\tensorflow_core\python\tools\freeze_graph.py:233: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
W0302 15:33:55.457255 10368 deprecation.py:323] From C:\Users\alexa\PythonEnvironments\py3.7_tf1.15.2\Lib\site-packages\tensorflow_core\python\tools\freeze_graph.py:233: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.

Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From C:\Users\alexa\PythonEnvironments\py3.7_tf1.15.2\lib\site-packages\tensorflow_core\python\framework\graph_util_impl.py:277: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
W0302 15:33:55.460485 10368 deprecation.py:323] From C:\Users\alexa\PythonEnvironments\py3.7_tf1.15.2\lib\site-packages\tensorflow_core\python\framework\graph_util_impl.py:277: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
INFO:tensorflow:Froze 102 variables.
I0302 15:33:55.983364 10368 graph_util_impl.py:334] Froze 102 variables.
INFO:tensorflow:Converted 102 variables to const ops.
I0302 15:33:57.006666 10368 graph_util_impl.py:394] Converted 102 variables to const ops.

C:\Users\alexa\Documents\BERT
(py3.7_tf1.15.2) λ python "c:\Program Files (x86)\IntelSWTools\openvino_2020.1.033\deployment_tools\model_optimizer\mo_tf.py" --data_type FP16 --input_shape [1,384] --input_model ./frozen-distilbert-base-cased-distilled-squad.pb
Model Optimizer arguments:
Common parameters:
        - Path to the Input Model:      C:\Users\alexa\Documents\BERT\./frozen-distilbert-base-cased-distilled-squad.pb
        - Path for generated IR:        C:\Users\alexa\Documents\BERT\.
        - IR output name:       frozen-distilbert-base-cased-distilled-squad
        - Log level:    ERROR
        - Batch:        Not specified, inherited from the model
        - Input layers:         Not specified, inherited from the model
        - Output layers:        Not specified, inherited from the model
        - Input shapes:         [1,384]
        - Mean values:  Not specified
        - Scale values:         Not specified
        - Scale factor:         Not specified
        - Precision of IR:      FP16
        - Enable fusing:        True
        - Enable grouped convolutions fusing:   True
        - Move mean values to preprocess section:       False
        - Reverse input channels:       False
TensorFlow specific parameters:
        - Input model in text protobuf format:  False
        - Path to model dump for TensorBoard:   None
        - List of shared libraries with TensorFlow custom layers implementation:        None
        - Update the configuration file with input/output node names:   None
        - Use configuration file used to generate the model with Object Detection API:  None
        - Operations to offload:        None
        - Patterns to offload:  None
        - Use the config file:  None
Model Optimizer version:        2020.1.0-61-gd349c3ba4a
[ ERROR ]  Cannot infer shapes or values for node "StatefulPartitionedCall".
[ ERROR ]  Input 1 of node StatefulPartitionedCall was passed float from tf_distil_bert_for_question_answering/distilbert/embeddings/word_embeddings/weight_port_0_ie_placeholder:0 incompatible with expected resource.
[ ERROR ]
[ ERROR ]  It can happen due to bug in custom shape infer function <function tf_native_tf_node_infer at 0x000001BB66091F78>.
[ ERROR ]  Or because the node inputs have incorrect values/shapes.
[ ERROR ]  Or because input shapes are incorrect (embedded to the model or passed via --input_shape). [ ERROR ]  Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ]  Exception occurred during running replacer "REPLACEMENT_ID" (<class 'extensions.middle.PartialInfer.PartialInfer'>): Stopped shape/value propagation at "StatefulPartitionedCall" node.
 For more information please refer to Model Optimizer FAQ (https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html), question #38.

SuryaPSC_Intel · ‎03-03-2020

Hi Alexandru,

Have you tried using --disable_nhwc_to_nchw parameter with mo_tf.py?

The github link you specified require the nightly version of TensorFlow and might thus be unstable.

Best Regards,

Surya

alexandru_irimiea · ‎03-03-2020

Hi Chauhan, it doesn't seem related to these parameters. I use TF 1.15.

At first I thought it might be related to unsupported operators, but this looks like a bug in the model optimizer, because I don't even reach the phase of executing the model on any device (CPU or MYRIAD).

C:\Users\alexa\Documents\BERT
λ C:\Users\alexa\PythonEnvironments\py3.7_tf1.15.2\Scripts\activate.bat

C:\Users\alexa\Documents\BERT
(py3.7_tf1.15.2) λ "C:\Program Files (x86)\IntelSWTools\openvino\bin\setupvars.bat"
Python 3.7.6
ECHO is off.
PYTHONPATH=C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\open_model_zoo\tools\accuracy_checker;C:\Program Files (x86)\IntelSWTools\openvino\python\python3.7;C:\Program Files (x86)\IntelSWTools\openvino\python\python3;C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer;
[setupvars.bat] OpenVINO environment initialized

C:\Users\alexa\Documents\BERT
(py3.7_tf1.15.2) λ pip show tensorflow
Name: tensorflow
Version: 1.15.2
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: c:\users\alexa\pythonenvironments\py3.7_tf1.15.2\lib\site-packages
Requires: wrapt, google-pasta, termcolor, wheel, keras-preprocessing, tensorboard, astor, numpy, six, gast, absl-py, grpcio, opt-einsum, keras-applications, tensorflow-estimator, protobuf
Required-by:

C:\Users\alexa\Documents\BERT
(py3.7_tf1.15.2) λ cd NCS\

C:\Users\alexa\Documents\BERT\NCS
(py3.7_tf1.15.2) λ ls distilbert-base-cased-distilled-squad
assets/  saved_model.pb  variables/

C:\Users\alexa\Documents\BERT\NCS
(py3.7_tf1.15.2) λ python "c:\Program Files (x86)\IntelSWTools\openvino_2020.1.033\deployment_tools\model_optimizer\mo_tf.py" --data_type FP16 --input_shape [1,384] --disable_nhwc_to_nchw --saved_model_dir ./distilbert-base-cased-distilled-squad
Model Optimizer arguments:
Common parameters:
        - Path to the Input Model:      None
        - Path for generated IR:        C:\Users\alexa\Documents\BERT\NCS\.
        - IR output name:       saved_model
        - Log level:    ERROR
        - Batch:        Not specified, inherited from the model
        - Input layers:         Not specified, inherited from the model
        - Output layers:        Not specified, inherited from the model
        - Input shapes:         [1,384]
        - Mean values:  Not specified
        - Scale values:         Not specified
        - Scale factor:         Not specified
        - Precision of IR:      FP16
        - Enable fusing:        True
        - Enable grouped convolutions fusing:   True
        - Move mean values to preprocess section:       False
        - Reverse input channels:       False
TensorFlow specific parameters:
        - Input model in text protobuf format:  False
        - Path to model dump for TensorBoard:   None
        - List of shared libraries with TensorFlow custom layers implementation:        None
        - Update the configuration file with input/output node names:   None
        - Use configuration file used to generate the model with Object Detection API:  None
        - Operations to offload:        None
        - Patterns to offload:  None
        - Use the config file:  None
Model Optimizer version:        2020.1.0-61-gd349c3ba4a
[ ERROR ]  Unexpected exception happened during extracting attributes for node tf_distil_bert_for_question_answering/distilbert/transformer/layer_._5/output_layer_norm/beta/Read/ReadVariableOp.
Original exception message: 'ascii' codec can't decode byte 0x8f in position 1: ordinal not in range(128)

SuryaPSC_Intel · ‎03-03-2020

Hi Alexandru,

Can you share the model and necessary files so that we can replicate the optimization at our end. If required, I can send a PM to share the model privately.

Best Regards,

Surya

alexandru_irimiea · ‎03-03-2020

I attached the saved_model here. It's none other than the one retrieved with the Python code I attached in the first post. Thank you!

SuryaPSC_Intel · ‎03-04-2020

Hi Alexandru,

The requirement.txt clearly states that TF-nightly version is required to generate the model which indicates the model is trained using TF-nightly which is not a supported Tensorflow version for OpenVINO.

Kindly, try training and freezing the model using tensorflow-1.14 and then try optimizing.

Best regards,
Surya