Hello,
I benchmarked the execution speed of ShuffleNet V2 on the Movidius 1 stick. And only obtained about 9 fps, that is slower than MobileNet v2 that achieves 23 fps. I looked at the execution times of the different layers and saw that the reshape operation is taking about 34% of the compute time. The reshape is needed for the shuffle operation that's implemented like this:
def concat_shuffle_split(x, y): with tf.name_scope('concat_shuffle_split'): shape = tf.shape(x) batch_size = shape[0] height, width = shape[1], shape[2] depth = x.shape[3].value z = tf.concat([x, y], axis=3) z = tf.reshape(z, [batch_size, height, width, 2, depth]) z = tf.transpose(z, [0, 1, 2, 4, 3]) z = tf.reshape(z, [batch_size, height, width, 2*depth]) x, y = tf.split(z, num_or_size_splits=2, axis=3) return x, y
Is it normal that the reshape operation takes that long? Does the reshape operation need to move memory blocks around and is it because of that that it's taking a long time?
Thanks in advance,
Emiel Deprost
Link Copied
Dearest Deprost, Emiel,
For sure the NCS1 stick is worse performing than NCS2. You are using 5D tensors which causes a lot of re-layering and re-ordering. Here is a transparent answer.
There is a chance that Reshape will perform better if you avoid 5D data.
Hope it helps !
Thanks for using OpenVino,
Shubha
Hi Emiel,
can you confirm that you are using OpenVINO for performing this benchmark?
Best,
Severine
Dearest Deprost, Emiel,
For sure the NCS1 stick is worse performing than NCS2. You are using 5D tensors which causes a lot of re-layering and re-ordering. Here is a transparent answer.
There is a chance that Reshape will perform better if you avoid 5D data.
Hope it helps !
Thanks for using OpenVino,
Shubha
For more complete information about compiler optimizations, see our Optimization Notice.