Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

_mm_load_ps generates VMOVUPS

emmanuel_attia
Beginner
1,193 Views

Hi all,

I've tested the following case using Intel XE Compiler 2011.3 and 2013.4

I have a question, let's take a very basic SSE function:

[cpp]void test1(float * pool)
{
    __m128 v = _mm_load_ps(pool);
    __m128 a = _mm_load_ps(pool + 8);

    _mm_store_ps(pool + 16, _mm_add_ps(v, a));

    printf("test1: %g\n", pool[16]);
}[/cpp]

if I compile it without specific flags i get expected SSE code, aligned load (explicit for pool, implicit for pool + 20h) and store (pool + 40h):

[plain]00E410A3  movaps      xmm0,xmmword ptr [eax]
00E410A6  addps       xmm0,xmmword ptr [eax+20h]
00E410AA  movaps      xmmword ptr [eax+40h],xmm0 [/plain]

if I compile it using AVX i get unaligned load for pool, implicit aligned load for pool + 20h and unaligned store for pool + 40h

[plain]002F10A3  vmovups     ymm0,xmmword ptr [eax]
002F10A7  vaddps      ymm1,ymm0,xmmword ptr [eax+20h]
002F10AC  vmovups     xmmword ptr [eax+40h],xmm1[plain]

Is this expected ? Does this affect performance ?

Kind regards

0 Kudos
4 Replies
emmanuel_attia
Beginner
1,193 Views

When i say "I compile it using AVX", I mean /QxAVX under Windows (and that means in my project there is AVX elsewhere so not using this flags ends up in either emulating AVX instruction with SSE or mixing legacy / VEX instruction => performance disaster)

0 Kudos
Bernard
Valued Contributor I
1,193 Views

Look at this posthttp://software.intel.com/en-us/forums/topic/278573

0 Kudos
emmanuel_attia
Beginner
1,193 Views

Ok, after benchmarking random access load/store, seems VMOVUPS [XMM] = MOVAPS in term of computation time when memory is aligned.

Thanks a lot

0 Kudos
Bernard
Valued Contributor I
1,193 Views

You are welcome.

0 Kudos
Reply