Community
cancel
Showing results for 
Search instead for 
Did you mean: 
emmanuel_attia
Beginner
175 Views

_mm_load_ps generates VMOVUPS

Hi all,

I've tested the following case using Intel XE Compiler 2011.3 and 2013.4

I have a question, let's take a very basic SSE function:

[cpp]void test1(float * pool)
{
    __m128 v = _mm_load_ps(pool);
    __m128 a = _mm_load_ps(pool + 8);

    _mm_store_ps(pool + 16, _mm_add_ps(v, a));

    printf("test1: %g\n", pool[16]);
}[/cpp]

if I compile it without specific flags i get expected SSE code, aligned load (explicit for pool, implicit for pool + 20h) and store (pool + 40h):

[plain]00E410A3  movaps      xmm0,xmmword ptr [eax]
00E410A6  addps       xmm0,xmmword ptr [eax+20h]
00E410AA  movaps      xmmword ptr [eax+40h],xmm0 [/plain]

if I compile it using AVX i get unaligned load for pool, implicit aligned load for pool + 20h and unaligned store for pool + 40h

[plain]002F10A3  vmovups     ymm0,xmmword ptr [eax]
002F10A7  vaddps      ymm1,ymm0,xmmword ptr [eax+20h]
002F10AC  vmovups     xmmword ptr [eax+40h],xmm1[plain]

Is this expected ? Does this affect performance ?

Kind regards

0 Kudos
4 Replies
emmanuel_attia
Beginner
175 Views

When i say "I compile it using AVX", I mean /QxAVX under Windows (and that means in my project there is AVX elsewhere so not using this flags ends up in either emulating AVX instruction with SSE or mixing legacy / VEX instruction => performance disaster)

Bernard
Black Belt
175 Views

Look at this posthttp://software.intel.com/en-us/forums/topic/278573

emmanuel_attia
Beginner
175 Views

Ok, after benchmarking random access load/store, seems VMOVUPS [XMM] = MOVAPS in term of computation time when memory is aligned.

Thanks a lot

Bernard
Black Belt
175 Views

You are welcome.

Reply