Hi all, I've been playing with the examples in the zoo, and have a question about image aspect ratio, how do I ensure maximum accuracy? For example, the ssd mobilenet app will resize images to 300x300 which is a square ratio:
# Neural network assumes input images are these dimensions. SSDMN_NETWORK_IMAGE_WIDTH = 300 SSDMN_NETWORK_IMAGE_HEIGHT = 300
But, the video that it downloads and uses as examples is all 960x540 which is 16:9 ratio, for example:
The code uses OpenCV's resize filter. Now, let's compare the results of resizing vs. cropping then resizing. First, this is what resizing that video from 16:9 to 1:1 looks like:
This is what it looks like when you crop first and then resize:
These are clearly quite different, and my question is, how does this affect accuracy? Should I rather crop my video to 1:1 or do the example models all expect that video is going to be 16:9 squashed into 1:1 ratio? To the eye, it would seem more logical that the model would perform better on image files that are not distorted by resizing to a different ratio.
Thanks in advance.
I was thinking about this too recently. I think that the aspect ratio doesn't matter as long as it remains consistent. If the images you input to the network are always 16:9 then training on 16:9 images is fine. This is because the pre-processing before input to the network will always be the same 'squashedness', the detection you are trying to learn will always be distorted by the same amount.
Thanks Victor, yes agreed, I ran some experiments myself and came to the same conclusion that warping or distorting the image is a bad idea even if it can still work.
I think the examples in the ncappzoo should be changed to work on cropped video, otherwise it's just misleading for newcomers who might assume that 16:9 is the desirable ratio.