Re: Conversion of frozen TensorFlow Graph to Movidius Graph

idata · ‎06-08-2018

I'm getting the following errors when trying to convert a frozen model .pb file using mvNCCompile command in Ubuntu 16.04, Tensorflow 1.7 and the MNCS SDK 2.04

The first error is:

Executor failed to create kernel. Invalid argument: NodeDef mentions attr 'dilations' not in Op output:T; attr=T:type,allowed=[DT_HALF, DT_FLOAT]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]>;

The model is a GAN trained on a GPU then saved as a frozen model using a Tensorflow CPU install.

Graph files are here:

https://drive.google.com/drive/folders/1_v-XhhclGhbrrfVGM7Q0JiQYDDmrO_aP?usp=sharing

Full stacktrace attached.

idata · ‎06-09-2018

I thought it could be due to GPU specific instructions so I trained the model on TF CPU instead and got the same errors.

CPU graph files:

https://drive.google.com/drive/folders/1JuOM7yh_9pxaM_kt2N4IFED0lwpH3kL4?usp=sharing

This is a link to the GAN code:

https://github.com/andrewginns/CycleGAN-Tensorflow-PyTorch

idata · ‎06-09-2018

Tried again with TF 1.6 CPU and python 2.7 to train the network. Same error as before.

idata · ‎06-09-2018

I managed to fix the previous errors by adding some code to my freeze_graph.py to strip attributes

    for node in output_graph_def.node:
      if node.op == 'RefSwitch':
        node.op = 'Switch'
        for index in xrange(len(node.input)):
          if 'moving_' in node.input[index]:
            node.input[index] = node.input[index] + '/read'
      elif node.op == 'AssignSub':
        node.op = 'Sub'
        if 'use_locking' in node.attr: del node.attr['use_locking']
      if "dilations" in node.attr: del node.attr["dilations"]
      if "index_type" in node.attr: del node.attr["index_type"]

However I'm now getting:

    if d.decorator_argspec is not None), _inspect.getargspec(target))
    [Error 5] Toolkit Error: Stage Details Not Supported: FusedBatchNorm inputs mean and variance are not defined.  The graph is not created for inference.

I'm assuming that I need to convert the graph for inference using the TF Graph Transform Tool like in this thread. https://ncsforum.movidius.com/discussion/590/indexerror-list-index-out-of-range-trying-to-compile-tf-model

Though I'm a little unclear how my mean and variance for the FusedBatchNorm should be defined.

idata · ‎06-09-2018

Reverted to using the official freeze_graph instructions and transform_graph using bazel from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md

Ubuntu 16.04, python 2.7, TF 1.6 CPU, MNCS SDK 2.04

Added some code to the standard freeze_graph.py to try to account for the following errors

1) moving average error:

    ValueError: graph_def is invalid at node 'a2b_generator/Conv/BatchNorm/AssignMovingAvg': Input tensor 'a2b_generator/Conv/BatchNorm/moving_mean:0' Cannot convert a tensor of type float32 to an input of type float32_ref.

2) dilation error:

    NodeDef mentions attr 'dilations' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T;

However it seems like the dilation node.attr removal isn't working because mvNCCompile still returns the original error as in post 1. Neither the bazel version of freeze_graph or my simple_freeze_graph works.

freeze_graph.py modifications:

    #Fix node name errors
    for node in output_graph_def.node:
      if node.op == 'RefSwitch':
        node.op = 'Switch'
        for index in xrange(len(node.input)):
          if 'moving_' in node.input[index]:
            node.input[index] = node.input[index] + '/read'
      elif node.op == 'AssignSub':
        node.op = 'Sub'
        if 'use_locking' in node.attr:
          del node.attr['use_locking']
      if "index_type" in node.attr:
        del node.attr["index_type"]
      if "dilations" in node.attr:
        del node.attr["dilations"]
        print("Removed attr 'dilation'")

My freeze_graph command is:

bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=graph.pb \
--input_checkpoint="Epoch_(0)_(100of962).ckpt" \
--output_graph=/tmp/frozen_graph.pb --output_node_names=a2b_generator/Tanh

My transform_graph command is:

bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=/tmp/frozen_graph.pb \
--out_graph=/tmp/optimized_graph.pb \
--inputs='Placeholder' \
--outputs='a2b_generator/Tanh' \
--transforms='
  strip_unused_nodes(type=float, shape="1,299,299,3")
  remove_nodes(op=Identity, op=CheckNumerics)
  fold_constants(ignore_errors=true)
  fold_batch_norms'

My mvNCCompile command is:

mvNCCompile /tmp/optimized_graph.pb -in Placeholder -on a2b_generator/Tanh

All files here: https://drive.google.com/drive/folders/1QKptbWQPqS974bcSfTo_rFYAbuLkhLIt?usp=sharing

-graph.pb is the GrafDef proto

-frozen_graph is the output from the freeze_graph

-optimised_graph is the output from the transform_graph and input to the mvNCCompile command

idata · ‎06-11-2018

@ginnsandrew At the moment, the NCSDK doesn't support Generative Adversarial Networks.

idata · ‎06-12-2018

@Tome_at_Intel For all intents and purposes a GAN is just a way of training a convolution network.

Is the error I'm getting specific to the use of a GAN? During inference the network should just look like a convolution net. As far as I can tell the error I'm getting is due to a mismatch between the TF versions in training and inference. Does the MNC SDK use something other than python 2.7 and TF 1.6?

idata · ‎06-12-2018

@Tome_at_Intel

So it turns out the previous error was caused by the use of a different TF version when freezing and transforming my graph file. Using TF 1.6 for the freeze_graph and transform_graph fixed it.

I now get a new error:

mvNCCompile /tmp/optimized_graph.pb -in Placeholder -on a2b_generator/Tanh
mvNCCompile v02.00, Copyright @ Intel Corporation 2017

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py:871: DeprecationWarning: builtin type EagerTensor has no __module__ attribute
/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
shape: [1, 256, 256, 3]
res.shape:  (1, 256, 256, 3)
TensorFlow output shape:  (256, 256, 3)
Traceback (most recent call last):
  File "/usr/local/bin/mvNCCompile", line 156, in <module>
    create_graph(args.network, args.inputnode, args.outputnode, args.outfile, args.nshaves, args.inputsize, args.weights, args.explicit_concat, args.ma2480, args.scheduler, args)
  File "/usr/local/bin/mvNCCompile", line 137, in create_graph
    load_ret = load_network(args, parser, myriad_config)
  File "/usr/local/bin/ncsdk/Controllers/Scheduler.py", line 95, in load_network
    network.optimize()
  File "/usr/local/bin/ncsdk/Models/Network.py", line 250, in optimize
    self.convert_network_input_to_yxz()
  File "/usr/local/bin/ncsdk/Models/Network.py", line 337, in convert_network_input_to_yxz
    if self.stageslist[0].op in [StageType.fully_connected_layer, StageType.convolution, StageType.max_pooling,
IndexError: list index out of range

idata · ‎06-12-2018

@ginnsandrew Apologies, I meant that we don't have a GAN example for the NCSDK at the moment. For Python, the NCSDK can be used with Python 3.5 also. I am looking into your issue and I'll get back to you as soon as I find something. Thanks.

idata · ‎06-12-2018

@Tome_at_Intel Thanks. My latest files are here: https://drive.google.com/drive/folders/1U_sw-P-qYZ4ACtso5HqI0thcmbCmOa1H?usp=sharing

graph.pb - GrafDef proto

frozen_graph.pb - Output from freeze_graph

optimised_graph.pb - Output from transform_graph

Python 2.7.12, TF 1.6, Bazel 0.11.0, MNC SDK 2.04, Ubuntu 16.04.4

Commands used:

bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=graph.pb \
--input_checkpoint="Epoch_(0)_(100of962).ckpt" \
--output_graph=/tmp/frozen_graph.pb --output_node_names=a2b_generator/Tanh


bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=/tmp/frozen_graph.pb \
--out_graph=/tmp/optimized_graph.pb \
--inputs='Placeholder' \
--outputs='a2b_generator/Tanh' \
--transforms='
  strip_unused_nodes(type=float, shape="1,299,299,3")
  remove_nodes(op=Identity, op=CheckNumerics)
  fold_constants(ignore_errors=true)
  fold_batch_norms'

mvNCCompile /tmp/optimized_graph.pb -in Placeholder -on a2b_generator/Tanh

idata · ‎06-14-2018

@ginnsandrew Just wanted to give you an update. It looks like while parsing the graph file for the model, the NCSDK was not able to find any of the ops. Not sure why this is happening because I know for a fact that we do support some of these ops, however while debugging the model, I tried printing out the nodes from the stageslist list inside of Network.py and it was empty. That's why you receive a list index out of range error. I found this to be strange because when I used a separate script to read and print the nodes from the model, they were all there.

idata · ‎06-15-2018

@Tome_at_Intel Thanks for looking into it, I really appreciate it. I actually think it was a problem with the way I was saving the graphs. For some reason the standard freeze_graph tools don't seem to work with graphs with BatchNorms in them (which mine have).

With my new files I actually no longer have the list index out of range error

My new graph called optimised_graph.pb instead has the error

mvNCCompile /media/sf_vBox/optimized_graph.pb -in inputA -on a2b_generator/output_image
/usr/local/bin/ncsdk/Controllers/Parsers/TensorFlowParser/Convolution.py:44: SyntaxWarning: assertion is always true, perhaps remove parentheses?
  assert(False, "Layer type not supported by Convolution: " + obj.type)
mvNCCompile v02.00, Copyright @ Intel Corporation 2017

/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
shape: [1, 256, 256, 3]
Traceback (most recent call last):
  File "/usr/local/bin/mvNCCompile", line 169, in <module>
    create_graph(args.network, args.image, args.inputnode, args.outputnode, args.outfile, args.nshaves, args.inputsize, args.weights, args.explicit_concat, args.ma2480, args.scheduler, args.new_parser, args)
  File "/usr/local/bin/mvNCCompile", line 148, in create_graph
    load_ret = load_network(args, parser, myriad_config)
  File "/usr/local/bin/ncsdk/Controllers/Scheduler.py", line 100, in load_network
    parse_ret = parse_tensor(arguments, myriad_conf)
  File "/usr/local/bin/ncsdk/Controllers/TensorFlowParser.py", line 319, in parse_tensor
    item_shape = output_item.shape.as_list()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 820, in as_list
    raise ValueError("as_list() is not defined on an unknown TensorShape.")
ValueError: as_list() is not defined on an unknown TensorShape.

The new files can be found here: https://github.com/andrewginns/CycleGAN-Tensorflow-PyTorch/releases/tag/tf1.7-py3.6.4

Instructions to reproduce what I'm doing here: https://github.com/andrewginns/MSc-Project