Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
6704 Discussions

Images, Stride and Memory Alignment Question (IPP)

Rietschin__Axel
Beginner
689 Views

Hello there,

Is it worth to align scan lines in a image so each row begins on 16-aligned memory? That is, round up the stride to the next multiple of 16 bytes?

I assume this might help a bit when processing the entire image, but the real question is: does IPP cares?

If yes, along the same line, is it worth to 32-align scan lines on CPUs that have a 256 bit vector unit, or 64-align for AVX 512 chips?

Thanks,
Axel

0 Kudos
4 Replies
Jonghak_K_Intel
Employee
689 Views

Hi Alex,

When IPP allocates memory buffers for image processing, it ensures that data is aligned appropriately. So the source image itself doesn't need to be on aligned memory for the best performance.

Please see the below.

Malloc/Free

Intel IPP functions provide better performance if they process data with aligned pointers. Intel IPP provides the following functions to ensure that data is aligned appropriately - 16-byte for CPU that does not support Intel® Advanced Vector Extensions (Intel® AVX) instruction set, 32-byte for Intel AVX and Intel® Advanced Vector Extensions 2 (Intel® AVX2), and 64-byte for Intel® Many Integrated Core instructions.

 

void* ippMalloc(int length)
void ippFree(void* ptr)

 

The ippMalloc function provides appropriately aligned buffer, and the ippFree function frees it.

The signal and image processing libraries provide ippsMalloc and ippiMalloc functions, respectively, to allocate appropriately aligned buffer that can be freed by the ippsFree and ippiFree functions.

 

As one of the example of image processing applications, please refer : https://software.intel.com/en-us/node/504353 

0 Kudos
Adriaan_van_Os
New Contributor I
689 Views
Apropos ippMalloc, I looked with a debugger at its internal implementation on Mac OS X. I can be wrong, but it seems like ippMalloc calls malloc to allocate the requested size plus 0x44 bytes of memory and then returns an aligned pointer inside this memory area. But Mac OS X and Linux have posix_memalign built-in. This call was specifically created to return aligned memory blocks. int posix_memalign(void **memptr, size_t alignment, size_t size); Using posix_memalign has the following advantages 1. free can be used for all memory blocks, simplifying memory management (currently we have to know in the software if a memory block was allocated with malloc or ippMalloc) 2. Mac OS X has fantastic memory debugging facilities built-in, they work with posix_memalign but not with the current ippMalloc implementation 3. In a quick test, posix_memalign was twice as fast as ippMalloc 4. No extra bytes are required in the alignment (as posix_memalign is part of the OS) Regards, Adriaan van Os
0 Kudos
Rietschin__Axel
Beginner
689 Views

@Jon: thanks for your reply, however my question was slightly different. Your answer only guarantee that the first scanline in the buffer is properly aligned, I was asking about the other ones. I probably should tweak the stride in order to align *each* scanline.

0 Kudos
Adriaan_van_Os
New Contributor I
689 Views
@Alex My interpretation of Jon's answer is that it doesn't mattter, because data is copied to (aligned) buffers anyway. @myself With regard to posix_memalign I have to add the following 1. On Mac OS X, posix_memalign is available only since OS X 10.6 2. There is a bug in posix_memalign where calling it with a size of 0 and an alignment smaller than 512 triggers internal corruption of malloc's data structures; the symptom is "malloc: *** error for object 0xc3fa24: incorrect checksum for freed object - object was probably modified after being freed" warnings an a later (seemingly random) crash somewhere in the program.
0 Kudos
Reply