- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, guys,
I am using icc 15.0.2 which is compatible to gcc 4.4.7. Whenever I allocate a memory space with malloc function, the address is aligned by 16 bytes. I know gcc's malloc provides the alignment for 64-bit processors. Does the icc malloc function support the same alignment of address? I think it is related to the quality of vectorization and I definitely need to make sure the malloc function of icc also supports the alignment.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Default 16 byte alignment in malloc is specified in x86_64 abi. If you have a case where it is not so, it may be a reportable bug.
When the compiler can see that alignment is inherited from malloc , it is entitled to assume alignment. 16 byte alignment will not be sufficient for full avx optimization.
For a time,gcc had situations not shared by icc where stack objects weren't aligned. I think that was corrected before gcc 4.4.7, which has become outdated . It's reasonable to expect icc to perform equal or better alignment than gcc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel does not provide its own C or C++ runtime libraries so the version of malloc you link in should be the same as GNU's.
You can use memalign or posix_memalign if you want to ensure a specific alignment.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sunwoo,
You don't need to aligned your data to benefit from vectorization. For instance, suppose that you have an array v of n = 1000 floating point double and you want to run the following code
for (int i = 0; i < n; ++i) { v = 2 * v; }
most compilers, including the Intel compiler will vectorize the code even though v is not 32-byte aligned (I assume that you CPU has 256 bit vector length which is the case of modern Intel CPU). Suppose that v "=" 32 * k + 16. As a consequence, v + 2 is 32-byte aligned. The compiler will do the following:
- Treat the loop iterations i =0 and i = 1 sequentially (loop peeling)
- Then treat i = 2, i = 3, i = 4, i = 5 with one vector instruction.
- Use vector instructions up to the last vector instruction for i = 994, i = 995, i= 996, i = 997
- Treat the loop iterations i = 998, i = 999 sequentially (remainder)
So, except for the the very beginning and the very end of the loop, your code will get vectorized. You'll get a slight overhead for the loop peeling and the remainder, but with n = 1000, you won't feel anything.
The problem comes when n is small enough so you can't neglect loop peeling and the remainder. You also have the problem when you have two arrays running at the same time such as:
for (int i = 0; i < n; ++i) { v = 2 * w; }
If v and w are not aligned, there is no way to have aligned load for v, v[i + 1], v[i + 2], v[i + 3] and w, w[i + 1], w[i + 2], w[i + 3]. Therefore, the load has to be unaligned which *might* degrade performance. With modern CPU, most likely, you won't feel il (maybe a few percent slower, but it will be most likely in the noise of a basic timer measurement).
So aligning for vectorization is not a must. It is something that should be done in some special cases when a profiler shows that it is needed. Intel Advisor is the only profiler that I know that can do those things. When you have identified the loops that might get some speedup with alignement, you need to:
- Align the memory: you might use _mm_malloc
- Tell the compiler that the pointer you are going to use is aligned: you might use OpenMP 4 (#pragma omp simd aligned(p : 32)) or the Intel extension special __assume_aligned
Aligning the memory without telling the compiler is useless.
In a nutshell:
1) Profile with Intel Advisor
2) Align your memory where needed AND tell the compiler you've done it

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page