Using a NVIDIA DIGITS generated model with the NCS

idata · ‎07-03-2018

Hello,

I just picked up the NCS and have gotten several pre-built models and packages running on it. I have used, both the instructions to load tiny yolo and mobilenet to the device, both on my laptop and a raspberry pi 3. I'm now attempting to utilize models I've already been working with to test out the device. These models have been built off of the bvlc_googlenet.caffemodel using NVIDA DIGITS to train the network.

When I run mvNCCompile deploy.prototxt -w snapshot_iter_11970.caffemodel -s 12 -o graph I get the following response

mVNCCompile v02.00, Copyright @

[Warning: 37] Output layer's name (bbox/regressor) must match its top (bboxes)

[Error 17] Toolkit Error: Internal Error: Could not build graph. Missing link: transformed_data

I've not been able to find anything that is directly relevant.

Best regards,

Michael

idata · ‎07-03-2018

@mascenzi Can you provide a link to your network for testing? When using your own model, please make sure to remove all dropout layers and training related layers as mentioned in https://movidius.github.io/ncsdk/tf_compile_guidance.html and https://movidius.github.io/ncsdk/tf_slim.html.

idata · ‎07-18-2018

@Tome_at_Intel

Hello, Sorry for the delay in getting back to you. Below you will find a link to the model that was trained via Nvidia DIGITS, vehicle detection. I"m going to run through the links you provided. I"m very new at this, so please excuse my ignorance. When you suggested removing the dropout layers, this is done to the prototext file correct? After the model has been trained, not before training correct?

https://1drv.ms/f/s!Ak_Up0H34vXsja1VQvv3ZIB1127sXg

idata · ‎07-18-2018

@Tome_at_Intel

Hello again. I wanted to let you know I read through the two links you provided. This may be my ignorance speaking again, but I don't understand how to follow the provided documentation to help aid me.

I read through this link. https://movidius.github.io/ncsdk/caffe.html which talks about Caffe support for the Movidius. The links you provided looked at tensorflow models. Its my understanding that Nvidia DIGITS, while it does use tensorflow, outputs the models as a Caffe network.

Looking through the Caffe Support it looks very much like the model I have could fall inline with the requirements. I looked through and it didn't say anything about dropout or training layers layers.

As you probably were already fully away, I'm assuming while the tensorflow format is not relavent, I still need to remove the training and dropout layers. So if I'm correct, this would be the only dropout layer in the deploy.prototext file.

layer {

name: "pool5/drop_s1"

type: "Dropout"

bottom: "inception_5b/output"

top: "pool5/drop_s1"

dropout_param {

dropout_ratio: 0.40000000596

}

I couldn't actually find any training layers. I used this link http://caffe.berkeleyvision.org/tutorial/layers.html to research each of the layers in the deploy.prototxt file. It appears that the deploy.prototxt is meant for just that, being deployed. While the document original.prototxt has references to training layers. So I think those are already removed.

would it be safe to assume that I would just need to drop the dropoutlayer?

Thank you for your time and help.

Best,

Michael

idata · ‎07-18-2018

@Tome_at_Intel

just a heads up. after removing the dropout layer and attempting to compile the network for movidius, I get this.

__@__-l1016:~/movidius$ mvNCCompile -s 12 deploy.prototxt -w snapshot_iter_11970.caffemodel -o graph

mvNCCompile v02.00, Copyright @ Movidius Ltd 2016

[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 1:7: Message type "caffe.NetParameter" has no field named "Winput".

WARNING: Logging before InitGoogleLogging() is written to STDERR

F0718 15:12:49.594372 18341 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: deploy.prototxt

*** Check failure stack trace: ***

Aborted (core dumped)

idata · ‎07-20-2018

@mascenzi Which NCSDK version are you testing this with?

idata · ‎07-20-2018

@mascenzi I was able to modify the prototxt to get rid of the errors, however it seems that this model's input size (384 x 1248) is too large to work on the NCS. This network seems to require 125MB of memory to process and the NCS only sets aside 100 MB for network processing.

Link to the modified prototxt file.

idata · ‎07-23-2018

@Tome_at_Intel

ok this is good. so, correct me if I'm wrong, but what your saying is I need to work with images with a smaller size? So if I wanted to retrain this particular model with the same images. I should reduce the image size of the images first. Typically, what image size should I be looking at so that the network uses less than 100 MB of memory?

idata · ‎07-23-2018

@Tome_at_Intel

Tome,

I just finished reviewing the changes you made to the deploy.prototxt file. I wanted to ask some questions.

Just so we are on the same page I took both files and did diff via the linux command line. and here are the changes I found.

0a1
> name: user_nvidia_digits
12c13
<   top: "transformed_data"
---
>   top: "deploy_transform"
20c21
<   bottom: "transformed_data"
---
>   bottom: "deploy_transform"
2118c2119
<   bottom: "pool5/drop_s1"
---
>   bottom: "inception_5b/output"
2150,2151c2151,2152
<   bottom: "pool5/drop_s1"
<   top: "bboxes"
---
>   bottom: "inception_5b/output"
>   top: "bbox/regressor"
2172a2174
>

So I'm going to post my impressions and wave me off if I'm off base.

0a1 - Here we are just adding a name to the over all prototxt file correct. Whats the significance to this?

12c13 & 20c21 - This doesn't seem to be any real significance here, as you are just changing the names of the path traveled. No value changes. In either file, the path is the same, just using different names.

2118c2119 - 2150,2151c2151,2152 - After the last round of convolution layers, we do a concat layer which combines 4 convolution layers. In the original deploy.prototxt the concat layer is passed on to a pooling layer first and then passed on down two paths. One path is a final convolution layer and the other path is a convolution layer and a then a sigmoid layer.

However, you have changed the path so that the concat layer is then passed on down three paths. The major change is that it is no longer pooling the concat layer, the pooling takes place, but the concat layer is passed directly to the final convolution layers. Could you explain the significance here.

I guess just as important is, if I redo the training with smaller image sizes, like I mentioned in my previous post, and get a new model. Will the changes you made be the same for the new model? and would they be the only changes I need in order to compile a graph for the movidius?

idata · ‎07-23-2018

@mascenzi Hi, no problem. Most of the changes I made were to eliminate the dropout layer. I believe the NCSDK just ignores the dropout layer. Let me explain the changes that I made in more detail:

0a1 - For the name of the network, I just named the model because it can get confusing when working with many models.

12c13 & 20c21 - Although this change may not affect much, the NCSDK compiler likes this format (top parameter with the same name as the layer name) so that is why I made this change.

2118c2119 & 2150,2151c2151 & 2152 - Since the dropout layer is used only for training, I piped the bottom layer "inception_5b/output" straight through to cvg/classifier and bboc/regressor layers to bypass the dropout layer. You mention pooling the concat layer, but I only see a dropout layer after the concat layer.

idata · ‎07-23-2018

@Tome_at_Intel

This is great information. Thank you! It makes perfect sense, specially reading other documentation.

About the image size, will reducing the image size, allow the movidius compiler to work correctly? Or is it the number of images that it was trained on?

idata · ‎07-23-2018

@mascenzi Reducing the input size of the network will likely reduce amount of memory needed to compile the network. The number of images used to train the network will only affect the model's overall accuracy and won't have any effect on its ability to run of the NCS.

idata · ‎07-23-2018

@Tome_at_Intel

Excellent, thats what I assumed as well. I'm retraining the network right now with a reduced image size of 936x288, only a 25% reduction, but I'm assuming that should be enough. How do I go about determining the amount of memory the network with require?

The network will finish training in a few hours. I'll update the post tomorrow when I'm able to test it with the movidius compiler.

Thank you again for the support. Its much appreciated.

idata · ‎07-23-2018

@mascenzi You're welcomed. I haven't come across models with really large input sizes that work on the NCS. I think the largest working one I've seen is Tiny Yolo V1 with an input size of 448x448. Good luck!

idata · ‎07-24-2018

@Tome_at_Intel

So no go. When I run the compiler I get more errors.

mvNCCompile deploy.prototxt -w snapshot_iter_19140.caffemodel -s 12 -o graph

mvNCCompile v02.00, Copyright @ Movidius Ltd 2016

[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 1:7: Expected string.

WARNING: Logging before InitGoogleLogging() is written to STDERR

F0724 08:49:15.942924 21277 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: deploy.prototxt

*** Check failure stack trace: ***

Aborted (core dumped)

here is a link to both the original prototxt file and the edited file.

https://1drv.ms/f/s!Ak_Up0H34vXsja1rWelfxmYXIBsrQQ

I know you mentioned that you haven't seen the larger models work with the NCS. So rather than train using the Detectnet model, I'm going to switch gears a bit and see if I can get it trained using the MobileNet model.

If you have any suggestions for the original, that would be great.

idata · ‎07-24-2018

@mascenzi I was able to get around the issue using quotes around your model's name. Example: name: "model name". After I did that, I received a new error saying "ImportError: No module named 'caffe.layers'". I haven't seen this error before, but it could be related to using different Caffe versions. I'll do some more digging and let you know.

idata · ‎07-24-2018

@Tome_at_Intel

Yep, I did the same thing and came up with the same error.

Thank you!

idata · ‎07-25-2018

@Tome_at_Intel

Hi, just wanted to touch base with you. So one of the things I've been working on is generating a model for Mobilenetv1 from DIGITS. I know MobilenetV1 is compatible with NCSDK2 and thought that it might releave some of the issues. So I finally got it working this morning and am now able to generate a caffe model via DIGITS based on the MobilenetV1 model description. After a couple epoch's I figured I would test out the Movidius compiler with the new prototxt and caffemodel. I'm actually getting what appears to be the same results as when I used the detectnet model. I'm seeing the same ImportError: No module named 'caffe.layers'

mvNCProfile -s 12 deploy1.prototxt -w snapshot_iter_3190.caffemodel

/usr/local/bin/ncsdk/Controllers/Parsers/TensorFlowParser/Convolution.py:44: SyntaxWarning: assertion is always true, perhaps remove parentheses?

assert(False, "Layer type not supported by Convolution: " + obj.type)

ImportError: No module named 'caffe.layers'

Traceback (most recent call last):

File "/usr/local/bin/mvNCProfile", line 156, in

profile_net(args.network, args.inputnode, args.outputnode, args.nshaves, args.inputsize, args.weights, args.device_no, args.explicit_concat, args.ma2480, args.scheduler, args)

File "/usr/local/bin/mvNCProfile", line 135, in profile_net

load_ret = load_network(args, parser, myriad_config)

File "/usr/local/bin/ncsdk/Controllers/Scheduler.py", line 103, in load_network

parse_ret = parse_caffe(arguments, myriad_conf)

File "/usr/local/bin/ncsdk/Controllers/CaffeParser.py", line 351, in parse_caffe

net = caffe.Net(description, weights, caffe.TEST)

SystemError: returned NULL without setting an error

Here is a link to the new prototxt and caffemodel.

https://1drv.ms/f/s!Ak_Up0H34vXsja11SkxtzYANB0zv-g

(EDIT)

you know what I realized I didn't attempt to compile the graph. I used mvNCProfile instead. Here is the result of mvNCCompile

mvNCCompile -s 12 deploy1.prototxt -w snapshot_iter_3190.caffemodel

/usr/local/bin/ncsdk/Controllers/Parsers/TensorFlowParser/Convolution.py:44: SyntaxWarning: assertion is always true, perhaps remove parentheses?

assert(False, "Layer type not supported by Convolution: " + obj.type)

ImportError: No module named 'caffe.layers'

Traceback (most recent call last):

File "/usr/local/bin/mvNCCompile", line 169, in

create_graph(args.network, args.image, args.inputnode, args.outputnode, args.outfile, args.nshaves, args.inputsize, args.weights, args.explicit_concat, args.ma2480, args.scheduler, args.new_parser, args)

File "/usr/local/bin/mvNCCompile", line 148, in create_graph

load_ret = load_network(args, parser, myriad_config)

File "/usr/local/bin/ncsdk/Controllers/Scheduler.py", line 103, in load_network

parse_ret = parse_caffe(arguments, myriad_conf)

File "/usr/local/bin/ncsdk/Controllers/CaffeParser.py", line 351, in parse_caffe

net = caffe.Net(description, weights, caffe.TEST)

SystemError: returned NULL without setting an error