Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

NSC2 Stages optimization problem

nago
Beginner
375 Views

Hello,

I generated .xml and. bin file from Tiny yolov2 voc by following intel guide.

And i'm able to make it work with the NCS1.

To get valid results with the NCS2. I have to add the following option :

plugin.SetConfig({ {"VPU_HW_STAGES_OPTIMIZATION","NO"}});

By doing this the frame rate is divided by 3.

With printPerformanceCounts we can see the differents optimization done for the NCS2 :

With VPU_HW_STAGES_OPTIMIZATION = NO:

0-convolutional               EXECUTED       layerType: Convolution        realTime: 2689       cpu: 2689           execType: Conv
0-convolutional@biases        EXECUTED       layerType: Convolution        realTime: 1480       cpu: 1480           execType: Bias
11-maxpool                    EXECUTED       layerType: Pooling            realTime: 173        cpu: 173            execType: MaxPool
12-convolutional              EXECUTED       layerType: Convolution        realTime: 7336       cpu: 7336           execType: Conv
12-convolutional@biases       EXECUTED       layerType: Convolution        realTime: 138        cpu: 138            execType: Bias
14-maxpool                    EXECUTED       layerType: Pooling            realTime: 160        cpu: 160            execType: MaxPool
15-convolutional              EXECUTED       layerType: Convolution        realTime: 7232       cpu: 7232           execType: Conv
15-convolutional@biases       EXECUTED       layerType: Convolution        realTime: 89         cpu: 89             execType: Bias
17-maxpool                    EXECUTED       layerType: Pooling            realTime: 176        cpu: 176            execType: MaxPool
18-convolutional              EXECUTED       layerType: Convolution        realTime: 37231      cpu: 37231          execType: Conv
18-convolutional@biases       EXECUTED       layerType: Convolution        realTime: 145        cpu: 145            execType: Bias
2-maxpool                     EXECUTED       layerType: Pooling            realTime: 954        cpu: 954            execType: MaxPool
20-convolutional              EXECUTED       layerType: Convolution        realTime: 38027      cpu: 38027          execType: Conv
20-convolutional@biases       EXECUTED       layerType: Convolution        realTime: 142        cpu: 142            execType: Bias
22-convolutional              EXECUTED       layerType: Convolution        realTime: 1308       cpu: 1308           execType: Conv
22-convolutional@biases       EXECUTED       layerType: Convolution        realTime: 60         cpu: 60             execType: Bias
3-convolutional               EXECUTED       layerType: Convolution        realTime: 5416       cpu: 5416           execType: Conv
3-convolutional@biases        EXECUTED       layerType: Convolution        realTime: 764        cpu: 764            execType: Bias
5-maxpool                     EXECUTED       layerType: Pooling            realTime: 514        cpu: 514            execType: MaxPool
6-convolutional               EXECUTED       layerType: Convolution        realTime: 4741       cpu: 4741           execType: Conv
6-convolutional@biases        EXECUTED       layerType: Convolution        realTime: 404        cpu: 404            execType: Bias
8-maxpool                     EXECUTED       layerType: Pooling            realTime: 269        cpu: 269            execType: MaxPool
9-convolutional               EXECUTED       layerType: Convolution        realTime: 14747      cpu: 14747          execType: Conv
9-convolutional@biases        EXECUTED       layerType: Convolution        realTime: 227        cpu: 227            execType: Bias
LeakyReLU_                    EXECUTED       layerType: ReLU               realTime: 778        cpu: 778            execType: LeakyRelu
LeakyReLU_372                 EXECUTED       layerType: ReLU               realTime: 123        cpu: 123            execType: LeakyRelu
LeakyReLU_373                 EXECUTED       layerType: ReLU               realTime: 401        cpu: 401            execType: LeakyRelu
LeakyReLU_374                 EXECUTED       layerType: ReLU               realTime: 125        cpu: 125            execType: LeakyRelu
LeakyReLU_375                 EXECUTED       layerType: ReLU               realTime: 124        cpu: 124            execType: LeakyRelu
LeakyReLU_376                 EXECUTED       layerType: ReLU               realTime: 198        cpu: 198            execType: LeakyRelu
LeakyReLU_377                 EXECUTED       layerType: ReLU               realTime: 97         cpu: 97             execType: LeakyRelu
LeakyReLU_378                 EXECUTED       layerType: ReLU               realTime: 1489       cpu: 1489           execType: LeakyRelu
Receive-Tensor                EXECUTED       layerType: Receive-Tensor     realTime: 0          cpu: 0              execType: Receive-Tensor
input@FP16                    EXECUTED       layerType: <Extra>            realTime: 588        cpu: 588            execType: Convert_f32f16
output/YoloRegion             EXECUTED       layerType: RegionYolo         realTime: 359        cpu: 359            execType: RegionYolo
output/YoloRegion@FP16        EXECUTED       layerType: <Extra>            realTime: 41         cpu: 41             execType: Convert_f16f32
Total time: 128745   microseconds

Without VPU_HW_STAGES_OPTIMIZATION = NO:

0-convolutional@soh=1/5       EXECUTED       layerType: Convolution        realTime: 1727       cpu: 1727           execType: MyriadXHwConvolution
0-convolutional@soh=2/5 + ... EXECUTED       layerType: Convolution        realTime: 1721       cpu: 1721           execType: MyriadXHwConvolution + injected[Copy]
0-convolutional@soh=3/5 + ... EXECUTED       layerType: Convolution        realTime: 1722       cpu: 1722           execType: MyriadXHwConvolution + injected[Copy]
0-convolutional@soh=4/5 + ... EXECUTED       layerType: Convolution        realTime: 1725       cpu: 1725           execType: MyriadXHwConvolution + injected[Copy]
0-convolutional@soh=5/5 + ... EXECUTED       layerType: Convolution        realTime: 414        cpu: 414            execType: MyriadXHwConvolution + injected[Copy]
0-convolutional@soh=5/5@co... EXECUTED       layerType: Convolution        realTime: 22         cpu: 22             execType: Copy
11-maxpool                    OPTIMIZED_OUT  layerType: Pooling            realTime: 0          cpu: 0              execType: Pooling
12-convolutional              EXECUTED       layerType: Convolution        realTime: 1390       cpu: 1390           execType: MyriadXHwConvolution
14-maxpool                    OPTIMIZED_OUT  layerType: Pooling            realTime: 0          cpu: 0              execType: Pooling
15-convolutional              EXECUTED       layerType: Convolution        realTime: 1718       cpu: 1718           execType: MyriadXHwConvolution
17-maxpool                    EXECUTED       layerType: Pooling            realTime: 139        cpu: 139            execType: MyriadXHwPooling
17-maxpool@padding            EXECUTED       layerType: Pooling            realTime: 1245       cpu: 1245           execType: CopyMakeBorder
18-convolutional@soc=1/2      EXECUTED       layerType: Convolution        realTime: 3350       cpu: 3350           execType: MyriadXHwConvolution
18-convolutional@soc=2/2      EXECUTED       layerType: Convolution        realTime: 3284       cpu: 3284           execType: MyriadXHwConvolution
18-convolutional@soc=2/2@ReLU EXECUTED       layerType: Convolution        realTime: 204        cpu: 204            execType: LeakyRelu
2-maxpool                     OPTIMIZED_OUT  layerType: Pooling            realTime: 0          cpu: 0              execType: Pooling
20-convolutional@soc=1/3      EXECUTED       layerType: Convolution        realTime: 4478       cpu: 4478           execType: MyriadXHwConvolution
20-convolutional@soc=2/3      EXECUTED       layerType: Convolution        realTime: 4412       cpu: 4412           execType: MyriadXHwConvolution
20-convolutional@soc=3/3 +... EXECUTED       layerType: Convolution        realTime: 4430       cpu: 4430           execType: MyriadXHwConvolution + injected[Sum]
20-convolutional@soc=3/3@ReLU EXECUTED       layerType: Convolution        realTime: 199        cpu: 199            execType: LeakyRelu
22-convolutional              EXECUTED       layerType: Convolution        realTime: 168        cpu: 168            execType: MyriadXHwConvolution
3-convolutional@soh=1/3       EXECUTED       layerType: Convolution        realTime: 1385       cpu: 1385           execType: MyriadXHwConvolution
3-convolutional@soh=2/3 + ... EXECUTED       layerType: Convolution        realTime: 1396       cpu: 1396           execType: MyriadXHwConvolution + injected[Copy]
3-convolutional@soh=3/3 + ... EXECUTED       layerType: Convolution        realTime: 185        cpu: 185            execType: MyriadXHwConvolution + injected[Copy]
3-convolutional@soh=3/3@co... EXECUTED       layerType: Convolution        realTime: 18         cpu: 18             execType: Copy
5-maxpool                     OPTIMIZED_OUT  layerType: Pooling            realTime: 0          cpu: 0              execType: Pooling
6-convolutional               EXECUTED       layerType: Convolution        realTime: 1427       cpu: 1427           execType: MyriadXHwConvolution
8-maxpool                     OPTIMIZED_OUT  layerType: Pooling            realTime: 0          cpu: 0              execType: Pooling
9-convolutional               EXECUTED       layerType: Convolution        realTime: 1283       cpu: 1283           execType: MyriadXHwConvolution
LeakyReLU_                    OPTIMIZED_OUT  layerType: ReLU               realTime: 0          cpu: 0              execType: ReLU
LeakyReLU_372                 OPTIMIZED_OUT  layerType: ReLU               realTime: 0          cpu: 0              execType: ReLU
LeakyReLU_372@soc=2/2@accum   EXECUTED       layerType: Convolution        realTime: 274        cpu: 274            execType: Sum
LeakyReLU_373                 OPTIMIZED_OUT  layerType: ReLU               realTime: 0          cpu: 0              execType: ReLU
LeakyReLU_374                 OPTIMIZED_OUT  layerType: ReLU               realTime: 0          cpu: 0              execType: ReLU
LeakyReLU_375                 OPTIMIZED_OUT  layerType: ReLU               realTime: 0          cpu: 0              execType: ReLU
LeakyReLU_375@soc=3/3@accum   EXECUTED       layerType: Convolution        realTime: 211        cpu: 211            execType: Sum
LeakyReLU_376                 OPTIMIZED_OUT  layerType: ReLU               realTime: 0          cpu: 0              execType: ReLU
LeakyReLU_377                 OPTIMIZED_OUT  layerType: ReLU               realTime: 0          cpu: 0              execType: ReLU
LeakyReLU_378                 OPTIMIZED_OUT  layerType: ReLU               realTime: 0          cpu: 0              execType: ReLU
Receive-Tensor                EXECUTED       layerType: Receive-Tensor     realTime: 0          cpu: 0              execType: Receive-Tensor
input@FP16                    EXECUTED       layerType: <Extra>            realTime: 909        cpu: 909            execType: Convert_f32f16
output/YoloRegion             EXECUTED       layerType: RegionYolo         realTime: 527        cpu: 527            execType: RegionYolo
output/YoloRegion@FP16        EXECUTED       layerType: <Extra>            realTime: 42         cpu: 42             execType: Convert_f16f32
Total time: 40005    microseconds

 

I don't know how to make it work at max perf.

Best regards,

nago

 

 

 

 

 

0 Kudos
0 Replies
Reply