- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys,
I've read many time that data alignment is critical for maximising speed, but I am not quite sure how tocommunicate that information tothe compiler. I've attached a file that highlights the main characteristics of my app.
In a nutshell, I am dealing with images, which are represented by a class that contains a pointer to the data. This pointer is allocated with _aligned_malloc(), and each row is also padded to a multiple of "MemoryAlignment"(=64) so that the begininning of each row is also aligned. I access the 2D data in a standard way, i.e. 2 nested loops (for all rows, for all cols, ...)
How can I tell the compiler that all data are aligned?
Is it possible/necessary to inform the compiler that each row is aligned? Or is it sufficient to simply state before the nested loops that myImage.data is aligned?
Thanks in advance
Alex
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe
__declspec(aligned(N)) float FARR
for e.g. will align on N boundary. I think N (a power of 2) is limited though.
In x86 processors with MMX registers, many instrunctions require that their operands be aligned in memory otherwise they generate a general exception. Some instructions have their unaligned equivalent, but they are much slower.
If you use a 2d array, all its elements are adjacent, and all the rows will be aligned only if the row size is = N.
Intel software optimization manuals have examples,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#pragma vector aligned
?
This is equivalent to
#pragma vector always
with the additional assertion of aligned arrays. So it will attempt to vectorize the following for() without applying cost-benefit analysis, and without allowing for unaligned arrays. The first element of each array section in use in the loop must be aligned.
I doubt it will work on outer loops.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> The first element of each array section in use in the loop must be aligned.
Thanks,this answers my question perfectly.
Just one more to completely satisfy my curiosity: I include "#pragma vector aligned",but it did not improve the speed. Would it be correct to conclude that previously, without this pragma, the same machine code was used anyway, i.e. load instructions for aligned arrays?
Then is is correct to say that without that pragma, 2 version of the inner loop are generated, one for aligned arrays, and one for unaligned, and the correct version is determined at run-time? And if that pragma is included, only the aligned version is generated?
Thanks again,
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is also a chance that the code cannot be (machine) vectorized. In that case #pragma won't help — you could try to do it by hand if it comes to that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page