Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
7782 Discussions

Problem - mm_load_pd generates movaps

My program does some computing on matrices.
I was suprised by fact that visual studio compiler generated faster code than icl.
I checked settings of project in vs and I think everything is set correct.
Command line looks like this:

/c /O2 /Qipo /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /MT /GS /arch:SSE3 /fp:fast /Fo"x64\\Release/" /W1 /nologo /Qopenmp

I disassembled .obj file and find out that icl generates movaps instructions in place of _mm_load_pd intrinsics.
(WTF?) and generated code uses only 8 xmm registers. (WTF2?)

Code generated by vs looks OK, movapd in place of _mm_load_pd and all available registers used.

Am I doing something wrong? Are there hidden compiler settings or something? Really, wtf?

My configuration:
Intel C++ Intel 64 Compiler XE
Visual Studio 2005
Windows 7 Professional 64 bit

CPU: Intel Pentium T4200
Motherboard: Acer JV50

0 Kudos
1 Reply
Black Belt
You don't give much information here. Use of movaps in place of movapd is a standard optimization, saving 1 byte of code. It's conceivable that accidental alignments might come out worse. MSVC from VS2005 is often not as fast as the ones from VS2008SP1 or VS2010.
You can't tell from the number of different named registers whether there will be a physical difference, as hardware register renaming will make use of more registers. I'm trying to remember how long it's been since I saw a CPU without hardware renaming; it makes me feel my age.