As continuation of https://software.intel.com/en-us/forums/computer-vision/topic/804131
After our custom built YOLO network failed on UP AI core we decided to try and run it on Intel NCS 2. Unsurprisingly result was absolutely the same - network had complete non-sense as an output. After that i decided to fiddle with .xml file of converted model. The first thing that caught my eye were this lines:
<layer id="2" name="LeakyReLU_" precision="FP16" type="ReLU">
Negative slope multiplier was hard coded inside. My next idea was - hmm, maybe some fp16 convertion shenanigans may take place here. What if try to change this constant across all file in LeakyRelu layers. Instead of this number i placed 0.11.
And viola! Network started outputting valid results.
After some experiments i found out that modification of this slope multiplier in pretty large margins still gives some reasonable results(0.09-0.11), but as long as you have exactly 0.1 (or any number that leads to 0.1 after rounding and convertion) - network outputs some random stuff.
Another finding is that setting constant to exactly 0.1 changes network execution time! That possibly means that some troubles occur on optimization stage. By varying this constant (by small values) in multiple places i could achieve execution times from 15 ms to 25 ms. Nearly two times difference!
And last, but not least: there's actually one critical place in a .xml file where this 0.1 constant value completely breaks network outputs (if you keep 0.1 in other places it still more or less works). That place is right before the convolution layer with 1x1 kernel size, right before YOLO region layer.
<layer id="14" name="LeakyReLU_244" precision="FP16" type="ReLU"> <data negative_slope="0.10000000149011612"/> <input> <port id="0"> <dim>1</dim> <dim>256</dim> <dim>10</dim> <dim>26</dim> </port> </input> <output> <port id="1"> <dim>1</dim> <dim>256</dim> <dim>10</dim> <dim>26</dim> </port> </output> </layer> <layer id="15" name="14-convolutional" precision="FP16" type="Convolution"> <data dilation-x="1" dilation-y="1" group="1" kernel-x="1" kernel-y="1" output="30" pad-b="0" pad-r="0" pad-x="0" pad-y="0" stride="1,1,1,1" stride-x="1" stride-y="1"/> <input> <port id="0"> <dim>1</dim> <dim>256</dim> <dim>10</dim> <dim>26</dim> </port> </input> <output> <port id="3"> <dim>1</dim> <dim>30</dim> <dim>10</dim> <dim>26</dim> </port> </output> <blobs> <weights offset="785216" size="15360"/> <biases offset="800576" size="60"/> </blobs> </layer>
I think this problem might be the case in all YOLO related problems on NCS2, since all of them have that LeakyRelu layer with this hardcoded multiplier. Would be nice to hear some Intel devs feedback on that case!
Our guess is that you do input preprocessing at your end and pass 0..1 normalized input to IE. Instead you should give MO “--scale 255” option and run inference on original RGB. Is our guess correct ?
Thank you kindly for your response! Yes, your guess is right. If i understand it right if I provide "--scale 255" to MO then scaling process would become embedded in the IR. But if it's a simple scaling that can be done inside or outside the model, shouldn't this two operations be equal and lead to the same result? What am I missing? Are there any radical model optimization differences for differerent backends, since my approach works on GPU and CPU, but not MYRIAD?
I can confirm the findings of Nikolai. I trained a custom YOLOv3 model and followed the instructions exactly as listed in model optimizer. The output is nonsense, adjusting the negative slope to 1.1 improved the output result. Trying -scale 255 gave no output at all. So I guess something is missing here.
I had the same problem when I was converting the model without the --data_type FP16 argument (I was just changing all FP32 to FP16 in the .xml file, but the .bin file was containing 32FP weights). When I run the model optimizer with --data_type FP16, everything started working fine.