Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Double precision data alignment for MKL

Andrew_Smith
New Contributor III
500 Views
My Fortran project relies on natural alignment (default) for derived types which should gaurantee 64 bit alignment for real(kind=8). But in the recent Webinar introducing MKL 11 bitwise reproduceability it was stated that 128 bit alignment should be used as a general default (althougn the slide said 64 bit). I got the fealing that it was likely to increase futher as hardware vectors get wider in future. So, will the compilers default alignment be changed too?
0 Kudos
2 Replies
barragan_villanueva_
Valued Contributor I
500 Views
Hi,

For MKL 11.0 CBWR feature we recommend using 128-byte alignment for input/output data and buffers passed for MKL functions as arguments. It will guarantee that for the nearest 8+ years your application can work without changing alignments. Another way is to allocate memory via MKL_malloc() function which provides memory with correct alignment. But, it's OK to use the real alignment depending on size of SSE-registers if you absolutely sure about your current architecture:

128-bit register is in SSE2= require 16-byte alignment

256-bit register is in AVX, AVX2= require 32-byte alignment

512-bit register for MIC, = require 64-byte alignment

As to types like FORTRAN REAL(KIND=8), compiler still uses standart 8-byte alignment.
And, the same 8-byte alignment is used by system malloc() function.

0 Kudos
TimP
Honored Contributor III
500 Views
According to X64 or x86_64 OS ABI, the compiler ought to set default 16-byte alignment for arrays and malloc(). With the AVX compiler option, I've been lucky so far in seeing 32-byte alignments.
Next year, there may be an Intel compiler option for larger default alignments, to deal with the expected requirement for 64-byte alignment in the not too distant future. It's not necessarily practical right now to obtain 64-byte or larger alignments, although it can be done in C or C++ with the alignment modifiers and wrappers around malloc.
0 Kudos
Reply