In SemanticSegmentation all detection results become "Nan"

idata · ‎09-13-2018

Hello.

I am challenging UNet's Semantic Segmentation.

I succeeded in generating the model, but at the time of execution all the results are in trouble "Nan".

I can not tell whether there is a problem with image preprocessing or if there is a problem with the model.

Will not someone please help me?

DL Framework: Tensorflow

Input resolution: 128 x 128

Dataset: Pascal VOC 2012

My Model(CheckPoint and Graph): https://drive.google.com/file/d/1WFeY2VyFS7PKGPTI9oC9oxxVESu4ShZe/view?usp=sharing

idata · ‎09-13-2018

Detailed Per Layer Profile

                               Bandwidth   time
     Name                    MFLOPs  (MB/s)    (ms)
====================================================
0    conv2d/Relu               56.6   276.3   3.070
1    conv2d_1/Relu           1208.0   667.4  27.076
2    max_pooling2d/MaxPool      1.0   979.8   2.041
3    conv2d_2/Relu            604.0   414.1  11.212
4    conv2d_3/Relu           1208.0   304.5  30.483
5    max_pooling2d_1/MaxPool    0.5   977.2   1.024
6    conv2d_4/Relu            604.0   198.5  14.188
7    conv2d_5/Relu           1208.0   172.7  32.592
8    max_pooling2d_2/MaxPool    0.3   952.9   0.525
9    conv2d_6/Relu            604.0   262.7  12.883
10   conv2d_7/Relu           1208.0   279.4  24.189
11   max_pooling2d_3/MaxPool    0.1   918.5   0.273
12   conv2d_8/Relu            604.0   700.1  13.684
13   conv2d_9/Relu           1208.0   703.9  27.196
14   conv2d_transpose/Relu      0.0   395.4  10.436
15   conv2d_10/Relu          2415.9   280.6  48.150
16   conv2d_11/Relu          1208.0   281.8  23.986
17   conv2d_transpose_1/Relu    0.0   225.1   5.555
18   conv2d_12/Relu          2415.9   203.5  55.309
19   conv2d_13/Relu          1208.0   194.9  28.888
20   conv2d_transpose_2/Relu    0.0   139.2   5.390
21   conv2d_14/Relu          2415.9   306.4  60.587
22   conv2d_15/Relu          1208.0   307.5  30.193
23   conv2d_transpose_3/Relu    0.0   147.0   7.230
24   conv2d_16/Relu          2415.9   646.4  55.913
25   conv2d_17/Relu          1208.0   668.3  27.041
26   output/BiasAdd            46.1  1230.8   1.627
----------------------------------------------------
       Total inference time                  560.74
----------------------------------------------------

idata · ‎09-14-2018

@PINTO Looks like after conv2d_5/Relu layer, the results exceed the fp16 value limit.

idata · ‎09-14-2018

$ mvNCCheck deployfinal.ckpt.meta -s 12 -on max_pooling2d_2/MaxPool
/usr/lib/python3/dist-packages/scipy/stats/morestats.py:16: DeprecationWarning: Importing from numpy.testing.decorators is deprecated, 
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Result:  (16, 16, 256)
1) 4582 27180.0
2) 7142 26780.0
3) 4838 26720.0
4) 7910 26420.0
5) 7398 26380.0
Expected:  (16, 16, 256)
1) 4582 27224.69
2) 7142 26751.467
3) 4838 26726.771
4) 7398 26461.34
5) 7910 26399.03
------------------------------------------------------------
 Obtained values 
------------------------------------------------------------
 Obtained Min Pixel Accuracy: 0.40614702738821507% (max allowed=2%), Pass
 Obtained Average Pixel Accuracy: 0.0062270752096083015% (max allowed=1%), Pass
 Obtained Percentage of wrong values: 0.0% (max allowed=0%), Pass
 Obtained Pixel-wise L2 error: 0.022377591814212915% (max allowed=1%), Pass
 Obtained Global Sum Difference: 111103.3046875
------------------------------------------------------------

and then in the next layer:

$ mvNCCheck deployfinal.ckpt.meta -s 12 -on conv2d_6/Relu
[1mmvNCCheck v02.00, Copyright @ Intel Corporation 2017[0m
USB: Transferring Data...
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Result:  (16, 16, 512)
1) 117316 nan
2) 115780 nan
3) 115268 nan
4) 116450 nan
5) 117050 nan
Expected:  (16, 16, 512)
1) 125676 358967.16
2) 126188 354048.88
3) 123628 351221.84
4) 130284 350524.97
5) 125164 348793.38
------------------------------------------------------------
 Obtained values 
------------------------------------------------------------
 Obtained Min Pixel Accuracy: nan% (max allowed=2%), [91mFail[0m
 Obtained Average Pixel Accuracy: nan% (max allowed=1%), [91mFail[0m
 Obtained Percentage of wrong values: 7.704925537109375% (max allowed=0%), [91mFail[0m
 Obtained Pixel-wise L2 error: nan% (max allowed=1%), [91mFail[0m
 Obtained Global Sum Difference: nan
------------------------------------------------------------

idata · ‎09-16-2018

@Tome_at_Intel

Thank you for your polite answer.

I first learned how to use "mvNCCheck" from you.

Then, I adjusted the input resolution, the filter size and the number of classes.

However, even if I adjust the input resolution and filter size and number of class, overflow will occur.

I feel that the behavior of Deconv is strange.

Although it is another topic, If I run it several times under the same conditions on the same layer, "mvNCCeck" will succeed or fail.

It seems that the movement is not stable.

I am trying to give up the conversion. . .

By the way, Pure Tensorflow has succeeded in high-speed one-class segmentation.

Detailed Per Layer Profile
                                                                            Bandwidth   time
#    Name                                                             MFLOPs  (MB/s)    (ms)
============================================================================================
0    conv2d/Relu/batch_normalization/FusedBatchNorm                      7.1  1002.5   0.842
1    conv2d_1/Relu/batch_normalization_1/FusedBatchNorm                 18.9   943.6   2.386
2    max_pooling2d/MaxPool                                               0.1   380.1   0.658
3    conv2d_2/Relu/batch_normalization_2/FusedBatchNorm                  9.4   750.9   0.752
4    conv2d_3/Relu/batch_normalization_3/FusedBatchNorm                 18.9   914.7   1.235
5    max_pooling2d_1/MaxPool                                             0.1   489.8   0.255
6    conv2d_4/Relu/batch_normalization_4/FusedBatchNorm                  9.4   476.2   0.610
7    conv2d_5/Relu/batch_normalization_5/FusedBatchNorm                 18.9   729.9   0.795
8    max_pooling2d_2/MaxPool                                             0.0   521.5   0.120
9    conv2d_6/Relu/batch_normalization_6/FusedBatchNorm                  9.4   426.3   0.415
10   conv2d_7/Relu/batch_normalization_7/FusedBatchNorm                 18.9   485.8   0.726
11   max_pooling2d_3/MaxPool                                             0.0   481.7   0.065
12   conv2d_8/Relu                                                       9.4   368.9   0.578
13   conv2d_9/Relu                                                      18.9   529.9   0.800
14   conv2d_transpose/Relu                                               0.0   284.1   0.275
15   conv2d_10/Relu                                                     37.7   545.3   1.291
16   conv2d_11/Relu                                                     18.9   485.4   0.727
17   conv2d_transpose_1/Relu                                             0.0   178.9   0.262
18   conv2d_12/Relu                                                     37.7   790.8   1.468
19   conv2d_13/Relu                                                     18.9   744.1   0.780
20   conv2d_transpose_2/Relu                                             0.0   117.9   0.563
21   conv2d_14/Relu                                                     37.7  1081.9   2.088
22   conv2d_15/Relu                                                     18.9   948.7   1.191
23   conv2d_transpose_3/Relu                                             0.0    79.8   1.579
24   conv2d_16/Relu                                                     37.7  1289.9   3.490
25   conv2d_17/Relu                                                     18.9  1029.7   2.186
26   output/BiasAdd                                                      0.8   471.2   0.531
--------------------------------------------------------------------------------------------
                                                Total inference time                   26.67
--------------------------------------------------------------------------------------------

xxxx@ubuntu:~/git/segmentation_unet/model$ mvNCCheck deployfinal.ckpt.meta -s 12 -on conv2d_13/Relu
/usr/local/bin/ncsdk/Controllers/Parsers/TensorFlowParser/Convolution.py:44: SyntaxWarning: assertion is always true, perhaps remove parentheses?
  assert(False, "Layer type not supported by Convolution: " + obj.type)
mvNCCheck v02.00, Copyright @ Intel Corporation 2017

shape: [1, 128, 128, 3]
res.shape:  (1, 32, 32, 32)
TensorFlow output shape:  (32, 32, 32)
/usr/local/bin/ncsdk/Controllers/FileIO.py:65: UserWarning: You are using a large type. Consider reducing your data sizes for best performance
Blob generated
USB: Transferring Data...
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Result:  (32, 32, 32)
1) 27044 3.752
2) 26025 3.7402
3) 26020 3.6934
4) 25001 3.625
5) 2916 3.5957
Expected:  (32, 32, 32)
1) 20912 53.0136
2) 21936 52.6455
3) 19952 52.3813
4) 19888 52.3808
5) 22000 51.612
------------------------------------------------------------
 Obtained values 
------------------------------------------------------------
 Obtained Min Pixel Accuracy: 100.0% (max allowed=2%), Fail
 Obtained Average Pixel Accuracy: 10.501158237457275% (max allowed=1%), Fail
 Obtained Percentage of wrong values: 46.2738037109375% (max allowed=0%), Fail
 Obtained Pixel-wise L2 error: 19.24379799670513% (max allowed=1%), Fail
 Obtained Global Sum Difference: 182420.984375
------------------------------------------------------------

xxxx@ubuntu:~/git/segmentation_unet/model$ mvNCCheck deployfinal.ckpt.meta -s 12 -on conv2d_transpose_2/Relu
/usr/local/bin/ncsdk/Controllers/Parsers/TensorFlowParser/Convolution.py:44: SyntaxWarning: assertion is always true, perhaps remove parentheses?
  assert(False, "Layer type not supported by Convolution: " + obj.type)
mvNCCheck v02.00, Copyright @ Intel Corporation 2017

shape: [1, 128, 128, 3]
res.shape:  (1, 64, 64, 16)
TensorFlow output shape:  (64, 64, 16)
/usr/local/bin/ncsdk/Controllers/FileIO.py:65: UserWarning: You are using a large type. Consider reducing your data sizes for best performance
Blob generated
USB: Transferring Data...
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Result:  (64, 64, 16)
1) 65535 nan
2) 65534 nan
3) 21853 nan
4) 21852 nan
5) 21851 nan
Expected:  (64, 64, 16)
1) 60884 47.6098
2) 60948 46.8669
3) 61012 45.4951
4) 60820 44.7391
5) 61076 44.1104
/usr/local/bin/ncsdk/Controllers/Metrics.py:75: RuntimeWarning: invalid value encountered in greater
------------------------------------------------------------
 Obtained values 
------------------------------------------------------------
 Obtained Min Pixel Accuracy: nan% (max allowed=2%), Fail
 Obtained Average Pixel Accuracy: nan% (max allowed=1%), Fail
 Obtained Percentage of wrong values: 0.0% (max allowed=0%), Fail
 Obtained Pixel-wise L2 error: nan% (max allowed=1%), Fail
 Obtained Global Sum Difference: nan
------------------------------------------------------------

idata · ‎09-16-2018