Community
cancel
Showing results for 
Search instead for 
Did you mean: 
przemodz
Beginner
37 Views

Problem - mm_load_pd generates movaps

My program does some computing on matrices.
I was suprised by fact that visual studio compiler generated faster code than icl.
I checked settings of project in vs and I think everything is set correct.
Command line looks like this:

/c /O2 /Qipo /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /MT /GS /arch:SSE3 /fp:fast /Fo"x64\\Release/" /W1 /nologo /Qopenmp

I disassembled .obj file and find out that icl generates movaps instructions in place of _mm_load_pd intrinsics.
(WTF?) and generated code uses only 8 xmm registers. (WTF2?)

Code generated by vs looks OK, movapd in place of _mm_load_pd and all available registers used.

Am I doing something wrong? Are there hidden compiler settings or something? Really, wtf?

My configuration:
Intel C++ Intel 64 Compiler XE 12.0.2.154
Visual Studio 2005
Windows 7 Professional 64 bit

CPU: Intel Pentium T4200
Motherboard: Acer JV50


0 Kudos
1 Reply
TimP
Black Belt
37 Views

You don't give much information here. Use of movaps in place of movapd is a standard optimization, saving 1 byte of code. It's conceivable that accidental alignments might come out worse. MSVC from VS2005 is often not as fast as the ones from VS2008SP1 or VS2010.
You can't tell from the number of different named registers whether there will be a physical difference, as hardware register renaming will make use of more registers. I'm trying to remember how long it's been since I saw a CPU without hardware renaming; it makes me feel my age.