Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6403 Discussions

Disable Separable Convolution fusion from mvNCCompile?

idata
Employee
987 Views

I want to compile the Xception network to run on NCS. Unfortunately, the network size seems pretty big to fit into the stick, so I got the error:

 

[Error 35] Setup Error: Not enough resources on Myriad to process this network.

 

After looking for where it happened, it turns out that everytime the parser finds a separable conv. layer, it attempts to fuse the depthwise and pointwise conv operations together. Up to some layers, the fusing seem too big to handle within the Myriad memory so that the error is thrown.

 

I found a hack but not sure the consequence. From the file

 

/usr/local/bin/ncssdk/Models/NetworkStage.py line 394, I disable the if condition so that it won't fuse anymore. The model now was compiled nicely. However, its outputs are just nan's. That is why I think my hack went terribly wrong.

 

If I enable again the fusing, and just let the output layer somewhere in the middle (so that the memory won't be insufficient), the output this time is valid.

 

So my question is, is there any "safe hack" to disable layer fusion so that compiling "a slightly larger" network won't be a problem. This is totally fine with NCS, just a bit slower when doing separate depthwise and then pointwise conv. And indeed even if the mvncCompile parsed the network successfully, layer fusion can create troubles later on, at runtime. In my case, when I enable fusion and set the output at 10-th layer, the network is still compiled, but at runtime it announce Matmul scratch memory [204800] lower than required [239882]. So, layer fusion is a double blade.

 

The code snippet for layer fusion is the following (taken from NCSSDK)

 

if (stage.op == StageType.convolution and self.op == StageType.depthwise_convolution and stage.radixX == 1 and stage.radixY == 1 and self.postOp == StageType.none): print('Fusing depthconv and conv in',self.unprocessed_name,'and',stage.unprocessed_name) #Create the weights for a convolution that does deptwhise convolution (inCH, outCH, kH, kW) taps = np.zeros([self.inputDimZ, self.tapDimZ, self.radixY, self.radixX], np.float32) multiplier = int(self.tapDimZ/self.tapDimY) for y in range(self.radixY): for x in range(self.radixX): for c in range(self.tapDimY): for i in range(multiplier): taps[c,c*multiplier+i,y,x] = self.taps[y,x,c,i] #Turn them to [kH, kW, inCH, outCH) in order to be able to use matmul taps = taps.transpose(2,3,0,1) #Fuse the weights of the following 1x1 convolution into the just created weights stage.taps = np.matmul(taps,stage.taps[0,0]) #Bring some data from the previous stage (self) to this one (stage) as we are saving this one #Saving the previous node would be simpler, but unfortunately the parser keeps track #of what's the latest created node (stage), so we must keep it stage.inputDimX = self.inputDimX stage.inputDimY = self.inputDimY stage.inputDimZ = self.inputDimZ stage.inputStrideX = self.inputStrideX stage.inputStrideY = self.inputStrideY stage.inputStrideZ = self.inputStrideZ stage.tapDimX = self.tapDimX stage.tapDimY = self.tapDimY stage.radixX = self.radixX stage.radixY = self.radixY stage.strideX = self.strideX stage.strideY = self.strideY stage.padStyle = self.padStyle stage.top = self.top stage.data = self.data stage.dataIndex = self.dataIndex stage.dataPointer = self.dataPointer #Remove self from network and change references self.network.count = self.network.count - 1 self.network.stageslist.remove(self) stage.top = self.top if self in self.network.head: stage.network.storageOrder = stage.storageOrder self.network.head.remove(self) self.network.head.append(stage) else: for parents in self.network.search_several(self.top): newtail = [] for p in parents.tail: if p == self: newtail.append(stage) parents.tail = newtail return
0 Kudos
6 Replies
idata
Employee
670 Views

@dpvo Hm. A possible w/a is to find where the convolutions are "too big" and reduce the dimensions of the input to that layer. Can you provide your model?

0 Kudos
idata
Employee
670 Views

Thanks @Tome_at_Intel ! I have tried reducing the input image dimension from 299 x 299 down to 227 x 227 but still out of memory. I guess reducing more will severely affect model's performance. The thing is, if I disable the fusion of the depthwise conv. layer and the pointwise conv. layer (by commenting out the if condition in /usr/local/bin/ncsdk/Models/NetworkStage.py at line 394 or nearby there) then the compilation went fine. Just that the NaN's start to appear after several layers and then spread all over the higher layers.

 

You can find all the resources (output blobs saved as numpy txt files, plots, and NCS model file .graph) from the following Dropbox folder: https://www.dropbox.com/sh/xk1jyzte90ejnyv/AACVSrb-XK6rh6TGoHtbFvP3a?dl=0

 

In the following I show some of my efforts debugging this issue.

 

For a reference of Xception architecture, please refer to https://github.com/keras-team/keras/blob/master/keras/applications/xception.py

 

Without layer fusion, the first layer of separable convolution block2_sepconv1 is still "fine" because there is no NaN values and its output tensor is mostly identical to that of an Xception model with fusion (I was be able to compile several first layers with fusion so that there is no out of memory error). The comparison in term of output magnitude is shown in the below

 

 

But from the second layer of separable convolution block2_sepconv2, NaN occurs, just a small portion of 2% of out the total number of output entries of that layer (as seen below). (I masked out all NaN entries to be able to plot this histogram) At the same time, the absolute values of output grow as the following

 

 

In subsequent layers, the number of NaN grows quicker and output magnitudes grow too, like these

 

 

and

 

 

Until block4_sepconv1 all of the output entries are NaN.

 

 

You can have a look at those output tensors that I saved them as txt numpy matrices in the same Dropbox folder. Actually I have no idea how and why NaN start to happen. Any suggestion is appreciated!

0 Kudos
idata
Employee
670 Views

Essentially, all of my problems can be posed as a single question "Whether NCS support Depthwise Convolution under any circumstance?", which means only Depthwise Convolution and no Pointwise Convolution afterward (as in the case of SeparableConv2D), then NCS still support it? I read from the release notes that it does support the operation. But from what I experience, it is unsure. Because if the operation is supported, then it would not cause any trouble if I disable layer fusion.

0 Kudos
idata
Employee
670 Views

It turns out that layer fusion may not be the source of problem but a different one. I have a chance to finetune InceptionV3 on the same data. After compiled to NCS graph format. Testing that graph also gives me NaN outputs at the end of the network. Note that my models work well as original formats. The NaN values occur from intermediate layers while the first few layers values are still valid. My input size is 299 x 299, float16, normalized as img = img/127.5 - 1 (as usual for inceptionv3 network). Input and output names for all the graphs are input and predictions/Softmax. @Tome_at_Intel could you please have a look at my model? https://www.dropbox.com/s/nweom7cpnsnqxyq/InceptionV3.graph?dl=0

0 Kudos
idata
Employee
670 Views

@dpvo I was able to run an inference with your provided graph file and received the same results you did (all nans). Can you do me a favor and try using mvNCCheck with your version of the InceptionV3 model and post the results back here? If you could also provide the actual model (meta file), that would be nice also.

0 Kudos
idata
Employee
670 Views

@Tome_at_Intel I really appreciate your help. In the below you can find the log of mvNCCheck that I ran mvNCCheck ../models/inceptionv3/InceptionV3_noBN.meta -s 12 -in input -on predictions/Softmax -is 299 299 -i example_classid_0.jpg -id 0 -S 127.5 -M 1 -cs 0,1,2

 

[1mmvNCCheck v02.00, Copyright @ Movidius Ltd 2016[0m Layer conv2d_21/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_27/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_32/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_42/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_52/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_62/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_71/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_79/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_80/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_83/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_84/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_78/BiasAdd forced to im2col_v2, because its output is used in concat Layer activation_85/Relu forced to im2col_v2, because its output is used in concat Layer conv2d_88/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_89/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_92/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_93/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_87/BiasAdd forced to im2col_v2, because its output is used in concat Layer activation_94/Relu forced to im2col_v2, because its output is used in concat Layer activation_93/Relu forced to im2col_v2, because its output is used in concat Layer activation_84/Relu forced to im2col_v2, because its output is used in concat Layer conv2d_61/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_51/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_41/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_31/BiasAdd forced to im2col_v2, because its output is used in concat Layer conv2d_20/BiasAdd forced to im2col_v2, because its output is used in concat USB: Transferring Data... USB: Myriad Execution Finished USB: Myriad Connection Closing. USB: Myriad Connection Closed. Result: (1, 1, 45) 1) 44 nan 2) 21 nan 3) 19 nan 4) 18 nan 5) 17 nan Expected: (1, 45) 1) 5 1.0 2) 18 6.2048e-05 3) 29 1.0788e-05 4) 31 5.9605e-08 5) 44 0.0 ------------------------------------------------------------ Obtained values ------------------------------------------------------------ Obtained Min Pixel Accuracy: nan% (max allowed=2%), [91mFail[0m Obtained Average Pixel Accuracy: nan% (max allowed=1%), [91mFail[0m Obtained Percentage of wrong values: 0.0% (max allowed=0%), [91mFail[0m Obtained Pixel-wise L2 error: nan% (max allowed=1%), [91mFail[0m Obtained Global Sum Difference: nan ------------------------------------------------------------

 

I will inbox you the link to download the inception meta files.

0 Kudos
Reply