Re: Disable Separable Convolution fusion from mvNCCompile?

idata · ‎01-19-2018

I want to compile the Xception network to run on NCS. Unfortunately, the network size seems pretty big to fit into the stick, so I got the error:

[Error 35] Setup Error: Not enough resources on Myriad to process this network.

After looking for where it happened, it turns out that everytime the parser finds a separable conv. layer, it attempts to fuse the depthwise and pointwise conv operations together. Up to some layers, the fusing seem too big to handle within the Myriad memory so that the error is thrown.

I found a hack but not sure the consequence. From the file

/usr/local/bin/ncssdk/Models/NetworkStage.py line 394, I disable the if condition so that it won't fuse anymore. The model now was compiled nicely. However, its outputs are just nan's. That is why I think my hack went terribly wrong.

If I enable again the fusing, and just let the output layer somewhere in the middle (so that the memory won't be insufficient), the output this time is valid.

So my question is, is there any "safe hack" to disable layer fusion so that compiling "a slightly larger" network won't be a problem. This is totally fine with NCS, just a bit slower when doing separate depthwise and then pointwise conv. And indeed even if the mvncCompile parsed the network successfully, layer fusion can create troubles later on, at runtime. In my case, when I enable fusion and set the output at 10-th layer, the network is still compiled, but at runtime it announce Matmul scratch memory [204800] lower than required [239882]. So, layer fusion is a double blade.

The code snippet for layer fusion is the following (taken from NCSSDK)

        if (stage.op == StageType.convolution and self.op == StageType.depthwise_convolution and
            stage.radixX == 1 and stage.radixY == 1 and self.postOp == StageType.none):
            print('Fusing depthconv and conv in',self.unprocessed_name,'and',stage.unprocessed_name)
            #Create the weights for a convolution that does deptwhise convolution (inCH, outCH, kH, kW)
            taps = np.zeros([self.inputDimZ, self.tapDimZ, self.radixY, self.radixX], np.float32)
            multiplier = int(self.tapDimZ/self.tapDimY)
            for y in range(self.radixY):
                for x in range(self.radixX):
                    for c in range(self.tapDimY):
                        for i in range(multiplier):
                            taps[c,c*multiplier+i,y,x] = self.taps[y,x,c,i]
            #Turn them to [kH, kW, inCH, outCH) in order to be able to use matmul
            taps = taps.transpose(2,3,0,1)
            #Fuse the weights of the following 1x1 convolution into the just created weights
            stage.taps = np.matmul(taps,stage.taps[0,0])
            #Bring some data from the previous stage (self) to this one (stage) as we are saving this one
            #Saving the previous node would be simpler, but unfortunately the parser keeps track
            #of what's the latest created node (stage), so we must keep it
            stage.inputDimX = self.inputDimX
            stage.inputDimY = self.inputDimY
            stage.inputDimZ = self.inputDimZ
            stage.inputStrideX = self.inputStrideX
            stage.inputStrideY = self.inputStrideY
            stage.inputStrideZ = self.inputStrideZ
            stage.tapDimX = self.tapDimX
            stage.tapDimY = self.tapDimY
            stage.radixX = self.radixX
            stage.radixY = self.radixY
            stage.strideX = self.strideX
            stage.strideY = self.strideY
            stage.padStyle = self.padStyle
            stage.top = self.top
            stage.data = self.data
            stage.dataIndex = self.dataIndex
            stage.dataPointer = self.dataPointer
            #Remove self from network and change references
            self.network.count = self.network.count - 1
            self.network.stageslist.remove(self)
            stage.top = self.top
            if self in self.network.head:
                stage.network.storageOrder = stage.storageOrder
                self.network.head.remove(self)
                self.network.head.append(stage)
            else:
                for parents in self.network.search_several(self.top):
                    newtail = []
                    for p in parents.tail:
                        if p == self:
                            newtail.append(stage)
                    parents.tail = newtail
            return

idata · ‎01-19-2018

@dpvo Hm. A possible w/a is to find where the convolutions are "too big" and reduce the dimensions of the input to that layer. Can you provide your model?

idata · ‎01-20-2018

Thanks @Tome_at_Intel ! I have tried reducing the input image dimension from 299 x 299 down to 227 x 227 but still out of memory. I guess reducing more will severely affect model's performance. The thing is, if I disable the fusion of the depthwise conv. layer and the pointwise conv. layer (by commenting out the if condition in /usr/local/bin/ncsdk/Models/NetworkStage.py at line 394 or nearby there) then the compilation went fine. Just that the NaN's start to appear after several layers and then spread all over the higher layers.

You can find all the resources (output blobs saved as numpy txt files, plots, and NCS model file .graph) from the following Dropbox folder: https://www.dropbox.com/sh/xk1jyzte90ejnyv/AACVSrb-XK6rh6TGoHtbFvP3a?dl=0

In the following I show some of my efforts debugging this issue.

For a reference of Xception architecture, please refer to https://github.com/keras-team/keras/blob/master/keras/applications/xception.py

Without layer fusion, the first layer of separable convolution block2_sepconv1 is still "fine" because there is no NaN values and its output tensor is mostly identical to that of an Xception model with fusion (I was be able to compile several first layers with fusion so that there is no out of memory error). The comparison in term of output magnitude is shown in the below

But from the second layer of separable convolution block2_sepconv2, NaN occurs, just a small portion of 2% of out the total number of output entries of that layer (as seen below). (I masked out all NaN entries to be able to plot this histogram) At the same time, the absolute values of output grow as the following

In subsequent layers, the number of NaN grows quicker and output magnitudes grow too, like these

and

Until block4_sepconv1 all of the output entries are NaN.

You can have a look at those output tensors that I saved them as txt numpy matrices in the same Dropbox folder. Actually I have no idea how and why NaN start to happen. Any suggestion is appreciated!

idata · ‎01-20-2018

Essentially, all of my problems can be posed as a single question "Whether NCS support Depthwise Convolution under any circumstance?", which means only Depthwise Convolution and no Pointwise Convolution afterward (as in the case of SeparableConv2D), then NCS still support it? I read from the release notes that it does support the operation. But from what I experience, it is unsure. Because if the operation is supported, then it would not cause any trouble if I disable layer fusion.

idata · ‎01-21-2018

It turns out that layer fusion may not be the source of problem but a different one. I have a chance to finetune InceptionV3 on the same data. After compiled to NCS graph format. Testing that graph also gives me NaN outputs at the end of the network. Note that my models work well as original formats. The NaN values occur from intermediate layers while the first few layers values are still valid. My input size is 299 x 299, float16, normalized as img = img/127.5 - 1 (as usual for inceptionv3 network). Input and output names for all the graphs are input and predictions/Softmax. @Tome_at_Intel could you please have a look at my model? https://www.dropbox.com/s/nweom7cpnsnqxyq/InceptionV3.graph?dl=0

idata · ‎01-23-2018

@dpvo I was able to run an inference with your provided graph file and received the same results you did (all nans). Can you do me a favor and try using mvNCCheck with your version of the InceptionV3 model and post the results back here? If you could also provide the actual model (meta file), that would be nice also.

idata · ‎01-23-2018

@Tome_at_Intel I really appreciate your help. In the below you can find the log of mvNCCheck that I ran mvNCCheck ../models/inceptionv3/InceptionV3_noBN.meta -s 12 -in input -on predictions/Softmax -is 299 299 -i example_classid_0.jpg -id 0 -S 127.5 -M 1 -cs 0,1,2

[1mmvNCCheck v02.00, Copyright @ Movidius Ltd 2016[0m

Layer  conv2d_21/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_27/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_32/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_42/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_52/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_62/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_71/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_79/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_80/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_83/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_84/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_78/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  activation_85/Relu  forced to im2col_v2, because its output is used in concat
Layer  conv2d_88/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_89/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_92/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_93/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_87/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  activation_94/Relu  forced to im2col_v2, because its output is used in concat
Layer  activation_93/Relu  forced to im2col_v2, because its output is used in concat
Layer  activation_84/Relu  forced to im2col_v2, because its output is used in concat
Layer  conv2d_61/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_51/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_41/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_31/BiasAdd  forced to im2col_v2, because its output is used in concat
Layer  conv2d_20/BiasAdd  forced to im2col_v2, because its output is used in concat
USB: Transferring Data...
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Result:  (1, 1, 45)
1) 44 nan
2) 21 nan
3) 19 nan
4) 18 nan
5) 17 nan
Expected:  (1, 45)
1) 5 1.0
2) 18 6.2048e-05
3) 29 1.0788e-05
4) 31 5.9605e-08
5) 44 0.0
------------------------------------------------------------
 Obtained values 
------------------------------------------------------------
 Obtained Min Pixel Accuracy: nan% (max allowed=2%), [91mFail[0m
 Obtained Average Pixel Accuracy: nan% (max allowed=1%), [91mFail[0m
 Obtained Percentage of wrong values: 0.0% (max allowed=0%), [91mFail[0m
 Obtained Pixel-wise L2 error: nan% (max allowed=1%), [91mFail[0m
 Obtained Global Sum Difference: nan
------------------------------------------------------------

I will inbox you the link to download the inception meta files.