In the Myriad X VPU product brief, "Myriad X VPU is capable of delivering a total performance of over 4 trillion operations per second (TOPS)"
in another page, it also says "Over 1 trillion operations per second of DNN inferencing performance".
I also tested OpenVINO pre-trained models, and have below performance result.
face-detection-retail-0004 (Complexity 1.067 GFLOPs) takes 20ms for one image. 1.067/0.02=53 => 53 GFLOPs
human-pose-estimation-0001 (Complexity 15.435 GFLOPs) takes 200 ms for one image. 15.535/0.2=78 => 78 GFLOPs
Why the product brief says the performance is 4TOPS and 1TOPS. But the result of real tests is no more than 100 GFLOPs.
And could you teach me the correct evaluation method of the performance (FLOPS)?
- 4 TOPS is total compute capacity of all ALUs on SoC. That includes fixed function blocks like stereo depth that are not applicable for Neural Networks at all. So 1 TOP is closer to real compute power but is still a coarsely rounded number.
- Single inference task has access to only half of device. To get max utilization you should run >2 concurrent inference tasks on the same device.
- Given your perf numbers, your measurement surely includes all data transfer overhead.
- The GFLOPS numbers for network complexity which you show do not include all computations required to perform inference. Those are basically only convolutions, but there are other operations that require smaller compute but those are instead memory bound so they have non-trivial contribution to inference wall time.
All in all Bruce, if you improve your measurement accuracy, you should get 2-2.5 times higher numbers. I agree that some of the Myriad-X VPU documentation assumes a lot and does not give you the detail which I have just given you.
Thanks for using OpenVino !