- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm getting the following errors when trying to convert a frozen model .pb file using mvNCCompile command in Ubuntu 16.04, Tensorflow 1.7 and the MNCS SDK 2.04
The first error is:
Executor failed to create kernel. Invalid argument: NodeDef mentions attr 'dilations' not in Op output:T; attr=T:type,allowed=[DT_HALF, DT_FLOAT]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]>;
The model is a GAN trained on a GPU then saved as a frozen model using a Tensorflow CPU install.
Graph files are here:
https://drive.google.com/drive/folders/1_v-XhhclGhbrrfVGM7Q0JiQYDDmrO_aP?usp=sharing
Full stacktrace attached.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I thought it could be due to GPU specific instructions so I trained the model on TF CPU instead and got the same errors.
CPU graph files:
https://drive.google.com/drive/folders/1JuOM7yh_9pxaM_kt2N4IFED0lwpH3kL4?usp=sharing
This is a link to the GAN code:
https://github.com/andrewginns/CycleGAN-Tensorflow-PyTorch
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tried again with TF 1.6 CPU and python 2.7 to train the network. Same error as before.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I managed to fix the previous errors by adding some code to my freeze_graph.py to strip attributes
for node in output_graph_def.node:
if node.op == 'RefSwitch':
node.op = 'Switch'
for index in xrange(len(node.input)):
if 'moving_' in node.input[index]:
node.input[index] = node.input[index] + '/read'
elif node.op == 'AssignSub':
node.op = 'Sub'
if 'use_locking' in node.attr: del node.attr['use_locking']
if "dilations" in node.attr: del node.attr["dilations"]
if "index_type" in node.attr: del node.attr["index_type"]
However I'm now getting:
if d.decorator_argspec is not None), _inspect.getargspec(target))
[Error 5] Toolkit Error: Stage Details Not Supported: FusedBatchNorm inputs mean and variance are not defined. The graph is not created for inference.
I'm assuming that I need to convert the graph for inference using the TF Graph Transform Tool like in this thread. https://ncsforum.movidius.com/discussion/590/indexerror-list-index-out-of-range-trying-to-compile-tf-model
Though I'm a little unclear how my mean and variance for the FusedBatchNorm should be defined.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Reverted to using the official freeze_graph instructions and transform_graph using bazel from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md
Ubuntu 16.04, python 2.7, TF 1.6 CPU, MNCS SDK 2.04
Added some code to the standard freeze_graph.py to try to account for the following errors
1) moving average error:
ValueError: graph_def is invalid at node 'a2b_generator/Conv/BatchNorm/AssignMovingAvg': Input tensor 'a2b_generator/Conv/BatchNorm/moving_mean:0' Cannot convert a tensor of type float32 to an input of type float32_ref.
2) dilation error:
NodeDef mentions attr 'dilations' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T;
However it seems like the dilation node.attr removal isn't working because mvNCCompile still returns the original error as in post 1. Neither the bazel version of freeze_graph or my simple_freeze_graph works.
freeze_graph.py modifications:
#Fix node name errors
for node in output_graph_def.node:
if node.op == 'RefSwitch':
node.op = 'Switch'
for index in xrange(len(node.input)):
if 'moving_' in node.input[index]:
node.input[index] = node.input[index] + '/read'
elif node.op == 'AssignSub':
node.op = 'Sub'
if 'use_locking' in node.attr:
del node.attr['use_locking']
if "index_type" in node.attr:
del node.attr["index_type"]
if "dilations" in node.attr:
del node.attr["dilations"]
print("Removed attr 'dilation'")
My freeze_graph command is:
bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=graph.pb \
--input_checkpoint="Epoch_(0)_(100of962).ckpt" \
--output_graph=/tmp/frozen_graph.pb --output_node_names=a2b_generator/Tanh
My transform_graph command is:
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=/tmp/frozen_graph.pb \
--out_graph=/tmp/optimized_graph.pb \
--inputs='Placeholder' \
--outputs='a2b_generator/Tanh' \
--transforms='
strip_unused_nodes(type=float, shape="1,299,299,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms'
My mvNCCompile command is:
mvNCCompile /tmp/optimized_graph.pb -in Placeholder -on a2b_generator/Tanh
All files here: https://drive.google.com/drive/folders/1QKptbWQPqS974bcSfTo_rFYAbuLkhLIt?usp=sharing
-graph.pb is the GrafDef proto
-frozen_graph is the output from the freeze_graph
-optimised_graph is the output from the transform_graph and input to the mvNCCompile command
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ginnsandrew At the moment, the NCSDK doesn't support Generative Adversarial Networks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel For all intents and purposes a GAN is just a way of training a convolution network.
Is the error I'm getting specific to the use of a GAN? During inference the network should just look like a convolution net. As far as I can tell the error I'm getting is due to a mismatch between the TF versions in training and inference. Does the MNC SDK use something other than python 2.7 and TF 1.6?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel
So it turns out the previous error was caused by the use of a different TF version when freezing and transforming my graph file. Using TF 1.6 for the freeze_graph and transform_graph fixed it.
I now get a new error:
mvNCCompile /tmp/optimized_graph.pb -in Placeholder -on a2b_generator/Tanh
mvNCCompile v02.00, Copyright @ Intel Corporation 2017
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py:871: DeprecationWarning: builtin type EagerTensor has no __module__ attribute
/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
shape: [1, 256, 256, 3]
res.shape: (1, 256, 256, 3)
TensorFlow output shape: (256, 256, 3)
Traceback (most recent call last):
File "/usr/local/bin/mvNCCompile", line 156, in <module>
create_graph(args.network, args.inputnode, args.outputnode, args.outfile, args.nshaves, args.inputsize, args.weights, args.explicit_concat, args.ma2480, args.scheduler, args)
File "/usr/local/bin/mvNCCompile", line 137, in create_graph
load_ret = load_network(args, parser, myriad_config)
File "/usr/local/bin/ncsdk/Controllers/Scheduler.py", line 95, in load_network
network.optimize()
File "/usr/local/bin/ncsdk/Models/Network.py", line 250, in optimize
self.convert_network_input_to_yxz()
File "/usr/local/bin/ncsdk/Models/Network.py", line 337, in convert_network_input_to_yxz
if self.stageslist[0].op in [StageType.fully_connected_layer, StageType.convolution, StageType.max_pooling,
IndexError: list index out of range
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ginnsandrew Apologies, I meant that we don't have a GAN example for the NCSDK at the moment. For Python, the NCSDK can be used with Python 3.5 also. I am looking into your issue and I'll get back to you as soon as I find something. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel Thanks. My latest files are here: https://drive.google.com/drive/folders/1U_sw-P-qYZ4ACtso5HqI0thcmbCmOa1H?usp=sharing
graph.pb - GrafDef proto
frozen_graph.pb - Output from freeze_graph
optimised_graph.pb - Output from transform_graph
Python 2.7.12, TF 1.6, Bazel 0.11.0, MNC SDK 2.04, Ubuntu 16.04.4
Commands used:
bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=graph.pb \
--input_checkpoint="Epoch_(0)_(100of962).ckpt" \
--output_graph=/tmp/frozen_graph.pb --output_node_names=a2b_generator/Tanh
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=/tmp/frozen_graph.pb \
--out_graph=/tmp/optimized_graph.pb \
--inputs='Placeholder' \
--outputs='a2b_generator/Tanh' \
--transforms='
strip_unused_nodes(type=float, shape="1,299,299,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms'
mvNCCompile /tmp/optimized_graph.pb -in Placeholder -on a2b_generator/Tanh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ginnsandrew Just wanted to give you an update. It looks like while parsing the graph file for the model, the NCSDK was not able to find any of the ops. Not sure why this is happening because I know for a fact that we do support some of these ops, however while debugging the model, I tried printing out the nodes from the stageslist list inside of Network.py and it was empty. That's why you receive a list index out of range
error. I found this to be strange because when I used a separate script to read and print the nodes from the model, they were all there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel Thanks for looking into it, I really appreciate it. I actually think it was a problem with the way I was saving the graphs. For some reason the standard freeze_graph tools don't seem to work with graphs with BatchNorms in them (which mine have).
With my new files I actually no longer have the list index out of range error
My new graph called optimised_graph.pb instead has the error
mvNCCompile /media/sf_vBox/optimized_graph.pb -in inputA -on a2b_generator/output_image
/usr/local/bin/ncsdk/Controllers/Parsers/TensorFlowParser/Convolution.py:44: SyntaxWarning: assertion is always true, perhaps remove parentheses?
assert(False, "Layer type not supported by Convolution: " + obj.type)
mvNCCompile v02.00, Copyright @ Intel Corporation 2017
/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
shape: [1, 256, 256, 3]
Traceback (most recent call last):
File "/usr/local/bin/mvNCCompile", line 169, in <module>
create_graph(args.network, args.image, args.inputnode, args.outputnode, args.outfile, args.nshaves, args.inputsize, args.weights, args.explicit_concat, args.ma2480, args.scheduler, args.new_parser, args)
File "/usr/local/bin/mvNCCompile", line 148, in create_graph
load_ret = load_network(args, parser, myriad_config)
File "/usr/local/bin/ncsdk/Controllers/Scheduler.py", line 100, in load_network
parse_ret = parse_tensor(arguments, myriad_conf)
File "/usr/local/bin/ncsdk/Controllers/TensorFlowParser.py", line 319, in parse_tensor
item_shape = output_item.shape.as_list()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 820, in as_list
raise ValueError("as_list() is not defined on an unknown TensorShape.")
ValueError: as_list() is not defined on an unknown TensorShape.
The new files can be found here: https://github.com/andrewginns/CycleGAN-Tensorflow-PyTorch/releases/tag/tf1.7-py3.6.4
Instructions to reproduce what I'm doing here: https://github.com/andrewginns/MSc-Project
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page