- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I've tested the following case using Intel XE Compiler 2011.3 and 2013.4
I have a question, let's take a very basic SSE function:
[cpp]void test1(float * pool)
{
__m128 v = _mm_load_ps(pool);
__m128 a = _mm_load_ps(pool + 8);
_mm_store_ps(pool + 16, _mm_add_ps(v, a));
printf("test1: %g\n", pool[16]);
}[/cpp]
if I compile it without specific flags i get expected SSE code, aligned load (explicit for pool, implicit for pool + 20h) and store (pool + 40h):
[plain]00E410A3 movaps xmm0,xmmword ptr [eax]
00E410A6 addps xmm0,xmmword ptr [eax+20h]
00E410AA movaps xmmword ptr [eax+40h],xmm0 [/plain]
if I compile it using AVX i get unaligned load for pool, implicit aligned load for pool + 20h and unaligned store for pool + 40h
[plain]002F10A3 vmovups ymm0,xmmword ptr [eax]
002F10A7 vaddps ymm1,ymm0,xmmword ptr [eax+20h]
002F10AC vmovups xmmword ptr [eax+40h],xmm1[plain]
Is this expected ? Does this affect performance ?
Kind regards
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When i say "I compile it using AVX", I mean /QxAVX under Windows (and that means in my project there is AVX elsewhere so not using this flags ends up in either emulating AVX instruction with SSE or mixing legacy / VEX instruction => performance disaster)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Look at this posthttp://software.intel.com/en-us/forums/topic/278573
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, after benchmarking random access load/store, seems VMOVUPS [XMM] = MOVAPS in term of computation time when memory is aligned.
Thanks a lot
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are welcome.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page