Is it worth to align scan lines in a image so each row begins on 16-aligned memory? That is, round up the stride to the next multiple of 16 bytes?
I assume this might help a bit when processing the entire image, but the real question is: does IPP cares?
If yes, along the same line, is it worth to 32-align scan lines on CPUs that have a 256 bit vector unit, or 64-align for AVX 512 chips?
When IPP allocates memory buffers for image processing, it ensures that data is aligned appropriately. So the source image itself doesn't need to be on aligned memory for the best performance.
Please see the below.
void* ippMalloc(int length) void ippFree(void* ptr)
@Jon: thanks for your reply, however my question was slightly different. Your answer only guarantee that the first scanline in the buffer is properly aligned, I was asking about the other ones. I probably should tweak the stride in order to align *each* scanline.