I have two model, the larger one has 4.09M params and 2.99GFlops, the smaller one has 2.17M params and 1.82GFlops. Those two models have different structure, the larger one use Mobilenetv1 as backbone and several convs as head, the smaller one use Mobilenetv2 as backbone, several convs and convTransposes as neck, and several convs as head. When i use C++ interface to inference those two models, the larger one's avg inference time is 70ms, but the smaller one's avg inference time is 80ms, anyone can help me to figure out why?
Generally, as you mentioned, both of these models are using different backbone with a couple more different architecture.
Fyi, In mobilenetV1 the pointwise convolution either kept the number of channels the same or doubled them. In mobilenetV2 it does the opposite: it makes the number of channels smaller. This is why this layer is now known as the projection layer — it projects data with a high number of dimensions (channels) into a tensor with a much lower number of dimensions.
MobileNet V2 uses depthwise separable convolutions which are not directly supported in GPU firmware (the cuDNN library). Therefore, MobileNet V2 tends to be slower.
This is one of the factor that made smaller sized model to be slower. It's not solely relies on size.
you may also refer here: https://github.com/tensorflow/tensorflow/issues/21196
Hope this helps!
Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question