Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Alignment Issues

adi_shavit
Beginner
464 Views
Hi,

In a recent post on the OpenCV Y!G (http://groups.yahoo.com/group/OpenCV/message/31359), the topic of memory alignment came up.

Does IPP (and IPPI specifically) have any required alignment requirements? IPL used to prefer 8-byte alignment, but IPP's Alloc function gives 4-byte alignment (32-bit). OpenCV seems to use 4-byte alignment too.

Can you give a detailed description of all the alignment issues and how they effect the performace of IPP?

Please advise?
Thanks,
Adi
0 Kudos
5 Replies
Vladimir_Dudnik
Employee
464 Views
Hi Adi,
glad to see you again on the forum:)
You can find specifications of IPP memory allcation functions in IPP documentation, for example, Intel Integrated Performance Primitives, volume2: Image and Video processing, article 3 - Support functions, mention that:

Memory Allocation Functions

ippiMalloc Allocates memory aligned to 32-byte boundary.
ippiFree Frees memory allocated by the function ippiMalloc.

Alignment affect performance because there is processor pipeline stall while memory accesed on cache-line boundary. So, basically you will want to minimize such situations by different ways and one of them is alignment of memory buffers.
Regards,
Vladimir
0 Kudos
adi_shavit
Beginner
464 Views
Hi Vladimir,

I never left. I mostly lurk...

I saw the paragraph you quote before I posted and I'm still puzzled.

Indeed, your quote means that ippiMalloc() will really take care of 2 things:

1. Make sure the base address is aligned
2. Pad each image row with the appropriate number of pixels.

This means that each row can be process in WHOLE chunks of 4-bytes.

AFAIK, MMX and its kin, are more comfortable with bigger chunks (8, 16?), hence IPL's original preference for 8-byte alignment.

Does this mean that IPP actually allocates memory that will be used less effectively in cases where the row is a multiple of 4 but not 8?
An 11 pixel image will get 12 bytes width step with 4-byte alignment.
With 8-byte alignment it would get a 16-byte width step.

Also, the Y!G poster mentioned that IPP will be more efficient with 8 or 16 byte alignment. Is there any truth in this?
If so, then why do the allocation functions return 4-byte aligned data?

Thanks,
Adi
0 Kudos
Vladimir_Dudnik
Employee
464 Views
Adi,
You are right, ippiMalloc will care about allocation of memory with aligned start address for each image row. That means for 1x1 image ippiMalloc will allocate 32x1 bytes of memory (for one color channel and Ipp8u data type). So, such image can be effectively processedin 4 bytes, 8-bytes, 16-bytes and 32-bytes chunks (what size of chunk will work more efficient is archiecture depended and can vary on different architectures).
That's true, IPP will allocate some extra memory to achieve alignment, which provides effiency for SIMD processing.
Regards,
Vladimir
0 Kudos
adi_shavit
Beginner
464 Views
Duh!!
I'm so stupid!
I saw the 32-byte alignment statement and thought 32-bit alignment (4-bytes). Thus, my confusion.
I'm sorry for wasting your time.
Thanks,
Adi
0 Kudos
Vladimir_Dudnik
Employee
464 Views
great, never mind
Vladimir
0 Kudos
Reply