Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6622 Discussions

Double precision data alignment for MKL

New Contributor III
My Fortran project relies on natural alignment (default) for derived types which should gaurantee 64 bit alignment for real(kind=8). But in the recent Webinar introducing MKL 11 bitwise reproduceability it was stated that 128 bit alignment should be used as a general default (althougn the slide said 64 bit). I got the fealing that it was likely to increase futher as hardware vectors get wider in future. So, will the compilers default alignment be changed too?
0 Kudos
2 Replies
Valued Contributor I

For MKL 11.0 CBWR feature we recommend using 128-byte alignment for input/output data and buffers passed for MKL functions as arguments. It will guarantee that for the nearest 8+ years your application can work without changing alignments. Another way is to allocate memory via MKL_malloc() function which provides memory with correct alignment. But, it's OK to use the real alignment depending on size of SSE-registers if you absolutely sure about your current architecture:

128-bit register is in SSE2= require 16-byte alignment

256-bit register is in AVX, AVX2= require 32-byte alignment

512-bit register for MIC, = require 64-byte alignment

As to types like FORTRAN REAL(KIND=8), compiler still uses standart 8-byte alignment.
And, the same 8-byte alignment is used by system malloc() function.

Black Belt
According to X64 or x86_64 OS ABI, the compiler ought to set default 16-byte alignment for arrays and malloc(). With the AVX compiler option, I've been lucky so far in seeing 32-byte alignments.
Next year, there may be an Intel compiler option for larger default alignments, to deal with the expected requirement for 64-byte alignment in the not too distant future. It's not necessarily practical right now to obtain 64-byte or larger alignments, although it can be done in C or C++ with the alignment modifiers and wrappers around malloc.