Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
1135 Discussions

__m128 array becomes unaligned with IC optimization

TaylorIoTKidd
New Contributor I
1,843 Views
I'm sure this question that has been asked dozens of times. I just can't seem to figure out how to structure a search query that finds the answer.

PROBLEM: I allocated an __m128 aligned data array using new in the constructor. I then used it to perform an operation with _mm_dp_ps() within a function. It worked fine with no optimization. Using full optimization with the Intel Compiler, the data become unaligned and all sorts of bad things happened.

QUESTION: Isn't __m128 defined as being 16 byte aligned? Is this alignment not guaranteed with optimization? If so, is this a bug? Or did I just do something silly that I can't see?

(By the way, I got around this problem by dynamically allocating aligned data using "__m128 *sse_result = (__m128*) _mm_malloc(4*sizeof(__m128), 16);")

--
Taylor

CODE SNIPPETS:

Here's snippets of my code:

In the class definition:

__m128 *transMat_sse; //the sepia transformation matrix

In the constructor:

transMat_sse = new __m128[pixelComponentNum];

//Sepia SSE transformation matrix (MS version)
*(transMat_sse+0) = _mm_set_ps(0.393f, 0.769f, 0.189f, 0.0f);
*(transMat_sse+1) = _mm_set_ps(0.349f, 0.686f, 0.168f, 0.0f);
*(transMat_sse+2) = _mm_set_ps(0.272f, 0.534f, 0.131f, 0.0f);
*(transMat_sse+3) = _mm_set_ps( 0.0f, 0.0f, 0.0f, 1.0f);

My compiler arguments: /c /O2 /Ob2 /Oi /Qipo /I "\\include" /I ".\\Workloads" /I ".\\external\\vtune\\include" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_CRT_SECURE_NO_WARNINGS" /D "_MBCS" /EHsc /MD /GS /Gy /fp:fast /Fo"Win32\\Release_Intel/" /W3 /nologo /Zi /Qwd10121 /Qopenmp /QaxSSE4.2 /QxSSE2 /Q_multisrc-

0 Kudos
5 Replies
bronxzv
New Contributor II
1,844 Views

I don't know if it's a bug or a (lack of) feature of the Intel compiler but what I'll adviseanyone todo is to usecustom allocators (align 16 for SSE, 32 for AVX) and placement new / new[] to ensure the constructors are called, one basic (and slow) allocator may simply use _mm_malloc / _mm_free

NB: it's what I do since my early tests with SSE under Katmai P!!! and itworked well to port 20'000+lines ofsource code (and was easily ported recently to handle AVX alignment)

0 Kudos
TimP
Honored Contributor III
1,843 Views
malloc() is not pre-empted by Intel compilers. On Windows, you get the one provided by Microsoft, so you might consider _aligned_malloc() if you're looking for a solution which should be portable across recently supported varieties of Windows.
0 Kudos
levicki
Valued Contributor I
1,843 Views
Long time ago I complained about new[] and malloc() not returning aligned memory for modern (and now intrinsic) vector datatypes.

I find that particular "feature" a weakness of a stagnating language which should have been corrected already.

Alas, nobody dares to confront those dinosaurus' from C++ committee and these days it seems more important to add dozens of different flavors of threading extensions thus creating a paradox of choice instead of giving us one worthy and well thought out interface.

OpenMP, TBB, STM, Cilk, CEAN... what do they have in common?

- They are all unfinished attempts of making parallelization easy.
- They are all trying to solve different aspects of parallelism instead of providing a single all-around solution.
- They all make me want to give up on software development because to cover all cases I need to learn all of them. Learning 5 things instead of one means being average in all 5 instead of mastering a single one.

IMO C++ started suffering from "feature creep" instead of fixing some old design oversights.

Sorry about the rant.
0 Kudos
TimP
Honored Contributor III
1,843 Views
Quoting from Harbison & Steele 2nd edition (1987) (just to point out the situation has been thus for quite a while) "functions such as malloc, ....., always return pointers of type char * aligned on a boundary suitable for an object of any type." It's a stretch to imagine this being consistent with 32-bit Windows malloc, even if you restrict "any type" to data types defined in standard C. I would hope to see an indication soon on this forum soon what is intended to be done for AVX types.
In principle, new[] ought to respect alignments specified in the definition, e.g. new _m512. Of course, relying on what C++ ought to do is a sure way to unreliability.

TBB, Cilk, and Ct apparently now share a declared intention of their sponsors to be incompatible with OpenMP and a hope that OpenMP will no longer be advocated. My personal belief is that this attitude should provoke a backlash from those who care about multi-language applications (not accepting these various C++ namespaces as satisfying the requirement for multiple languages).
0 Kudos
levicki
Valued Contributor I
1,843 Views
Tim, I can't agree more with you. Well said.
0 Kudos
Reply