Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Cross-compiling for IA32 on Windows 7 64-bit to avoid "out of memory"

fgp_phlo_org
Beginner
894 Views

Hi

I'm trying to build a heavily templated application for IA32 with the Intel C++ Compiler for Windows, version 12.1.5.344, running on 64-bit Windows 7.  Unfortunately, however, the IA32-targetting icl.exe (and mcpcom.exe) seem to be 32-bit binary, and errors out after trying to allocate more than 4GB (which is obviously impossible for a 32-bit binary).

Is there a 64-bit version of the Intel C++ compiler available which is able to target IA32? It seems that currently only the reverse is supported, i.e. an IA32-binary which produces code for Intel64. Can I somehow convince Intel64/icl.exe to produce code for IA32?

I know that the linux version of the Intel C++ Compiler *does* support that kind of cross-compiling, but that doesn't help since I need to target Windows, not Linux? Unless there's a way to cross-compile on Linux for Windows, of course...

If there's no support for that kind of cross-compiling, are there any compiler flags which I might use to conserve memory, apart from disabling inlining? (My app absolutely depends on inlining for performance. There are a lot of functions which compile to a single SSE instruction). I'm already using /Qip-, which seems to help a bit, but maybe there are others...

best regards,
Florian Pflug

0 Kudos
21 Replies
jimdempseyatthecove
Honored Contributor III
794 Views
Florian,

I will defer the issue of using a 64-bit compiler to produce a 32-bit app to the Intel readers of this post. (This would not be an unreasonable request.)

I suspect that you have a

  #include "SingleHeaderThatBringsInAllTemplates.h"

Consider fragmenting large templates into smaller functional units then including only the templates necessary for the current .cpp file.

Jim Dempsey
0 Kudos
SergeyKostrov
Valued Contributor II
794 Views
Hi Florian,

Quoting fgp.phlo.org
...I'm trying to build a heavily templated application for IA32 with the Intel C++ Compiler for Windows, version 12.1.5.344, running on 64-bit Windows 7.  Unfortunately, however, the IA32-targetting icl.exe (and mcpcom.exe) seem to be 32-bit binary, and errors out after trying to allocate more than 4GB (which is obviously impossible for a 32-bit binary).

     [SergeyK] That is correct. A regular Win32 application cannot allocate more than 2GB of memory. The best
                        allocation number that I was able to get is ~1.99GB with a MinGW C++ compiler.

                        A non-regular Win32 application that uses Address Windowing Extensions ( AWE / a technology from Microsoft )
                        could allocate greater than 2GB of memory.


Are these erros from Intel C++ compiler or from your application?

Best regards,
Sergey

0 Kudos
SergeyKostrov
Valued Contributor II
794 Views
It looks like spam continues. Please take a look at a previous post [ on Thu, 09/20/2012 - 22:35 ].
0 Kudos
HLW_S_
Beginner
794 Views
From the Intel C++ compiler. That's why I was looking for compiler which targets IA-32, yet is itself a 64-bit application. I've by now discovered that running the IA-32 compiler on 64-bit Windows 7 helps a bit. The compiler still can't allocate more than 4GB, of course, but at least it can get about 3.8GB. Probably because there's no need for the kernel address space to lie within the first 4GB of memory if the kernel itself runs in 64-bit mode. (Dunno why it still reserves ~200MB, but my guess is that it's a DMA zone for legacy PCI hardware which cannot address more than 4GB)
0 Kudos
JenniferJ
Moderator
794 Views
The Intel C++ for ia32 is built with "/LARGEADDRESSAWARE". so it can get close to 4GB on a x64 OS. It seems your case is very extreme or maybe there is a compiler bug. is your code built successfully with MSVC? Jennifer
0 Kudos
HLW_S_
Beginner
794 Views
It builds successfully with both GCC and Clang on linux and Mac OS X. It doesn't build with MSVC due to MSVC's poor support for SSE vectors as member variables. Which, BTW, is the reason I turned to Intel's C++ Compiler in the first place. I've meanwhile managed to get ICC to compile the thing by using explicit template instantiation. My code contains about 30 or so instantiations of the same templated code, adapted via template arguments for slightly different use cases. With the help of explicit template instantiation, some preprocessor magic and and a rather complex build script I now compile each instantiation separately, which drives memory usage down to a couple of hundred MB. The costs is a huge increase in conceptual complexity - keeping track of all required instantiations of these templates manually really isn't fun :-( On the upside, I can now compile selected parts with Qinline-forceinline enabled, which brings about another 10% performance it seems. In conclusion, my problem is solved for now, but given how common 64-bit OSes are nowadays, it still seems silly to have to optimize for compiler memory usage. So, @Intel: Please consider making 64-bit builds of your IA-32 targetting compilers available.
0 Kudos
SergeyKostrov
Valued Contributor II
794 Views
>>...I've by now discovered that running the IA-32 compiler on 64-bit Windows 7 helps a bit. The compiler still can't allocate more than 4GB, >>of course, but at least it can get about 3.8GB. . A Win32 application without Microsoft's AWE can not allocate more than 2GB of memory. This is by design and it simply impossible to allocate 3.8GB for a 32-bit application ( a regular case ). A not regular case means a Microsoft's 32-bit operating system must support AWE and that option is only supported in server editions. I've done lots of testing on 32-bit platforms and a maximum amount of memory my test application was able to allocate is about ~1.9GB. It also depends on a C++ compiler a developer uses and a complexity of a test application. . >>...Please consider making 64-bit builds of your IA-32 targetting compilers available. . That's a good proposal but I don't think Intel will do it. It was shortly discussed that two different versions of Intel C++ compiler have to be used in order to build 32-bit or 64-bit applications. Microsoft, GCC, MinGW, etc follow the same path. That is, different C++ compilers for different platforms.
0 Kudos
SergeyKostrov
Valued Contributor II
794 Views
>>...so it can get close to 4GB on a x64 OS. >> >>It seems your case is very extreme... . I think an extreme case is when an application uses greater than 1TB of memory. Amounts of memory like 4GB or 8GB are no longer considered as unique or extreme. Also, this is a quote from MSDN: . ...64-bit Windows supports up to 1 terabyte of physical memory with 8 terabytes of address space for each process...
0 Kudos
SergeyKostrov
Valued Contributor II
794 Views
>>... It doesn't build with MSVC due to MSVC's poor support for SSE vectors as member variables... . That looks very strange. Could you provide an isolated example that shows a problem?
0 Kudos
HLW_S_
Beginner
794 Views
Yup, try this [cpp] struct T { __m128i data; }; T add(T a, T b) { const T result = { _mm_add_epi32(a.data, b.data) }; return result; } [/cpp] MSVC complains that T may not be passed by value, since it requires alignment of > 8 bytes (it requires 16-byte alignment to fullfill __m128i's alignment requirements). No other compiler I tried has the slightest problem with this.
0 Kudos
JenniferJ
Moderator
794 Views
HLW S. wrote:

So, @Intel: Please consider making 64-bit builds of your IA-32 targetting compilers available.

This is a big feature request. please file a ticket at Intel Premier Support (https://premier.intel.com/) as well. Jennifer
0 Kudos
SergeyKostrov
Valued Contributor II
794 Views
>>...The Intel C++ for ia32 is built with "/LARGEADDRESSAWARE". So it can get close to 4GB on a x64 OS... . Even if that option could be used in a 32-bit VS project it does not resolve a problem of 2GB limitation for a regular 32-bit application on a 32-bit Windows platform that does not support AWE.
0 Kudos
SergeyKostrov
Valued Contributor II
794 Views
>>struct T { >> __m128i data; >>}; . Please take into account that 'T' is a reserved word and it is used in C++ templates. Thanks for the test-case and I'll try it. . Best regards, Sergey
0 Kudos
HLW_S_
Beginner
794 Views
Sergey Kostrov wrote:
Even if that option could be used in a 32-bit VS project it does not resolve a problem of 2GB limitation for a regular 32-bit application on a 32-bit Windows platform that does not support AWE
True, but irrelevant. This is not about arbitrary application, it's about one very specific application, namely the Intel C++ Compiler.
Sergey Kostrov wrote:
Please take into account that 'T' is a reserved word and it is used in C++ templates. Thanks for the test-case and I'll try it.
That is wrong. 'T' is not, and never was, a reserved word. It's a common name for template parameters, but there's nothing special about it.

I consider my question to be answered. What I hoped for doesn't seem to exists, and I've found a workaround.

0 Kudos
SergeyKostrov
Valued Contributor II
794 Views
>>...No other compiler I tried has the slightest problem with this. . What C++ compilers did you try?
0 Kudos
SergeyKostrov
Valued Contributor II
794 Views
>>...MSVC complains that T may not be passed by value... . I had a different compilation error with your unmodified test-case: . C2719 - The align __declspec modifier is not permitted on function parameters.
0 Kudos
SergeyKostrov
Valued Contributor II
794 Views
HLW S. wrote:

Yup, try this


struct T {

  __m128i data;

};

T add(T a, T b) {

  const T result = { _mm_add_epi32(a.data, b.data) };

  return result;

}

MSVC complains that T may not be passed by value, since it requires alignment of > 8 bytes (it requires 16-byte alignment to fullfill __m128i's alignment requirements). No other compiler I tried has the slightest problem with this.

. There was a declaration error and try this instead: . typedef struct tagUserT { __m128i m_Data; } UserT; const UserT AddUserData( UserT &a, UserT &b ); const UserT AddUserData( UserT &a, UserT &b ) { UserT ut; ut.m_Data = _mm_add_epi32( a.m_Data, b.m_Data ); return ( UserT )ut; } . I compiled it with MS C++ compiler of VS 2005.
0 Kudos
SergeyKostrov
Valued Contributor II
794 Views
This is a short follow up. Here is a test case: . [cpp] ... UserT utA = { 1 }; UserT utB = { 2 }; UserT utC = { 0 }; ... utC = AddUserData( utA, utB ); ... [/cpp] . Verified with Intel, Microsoft and MinGW C/C++ compilers.
0 Kudos
HLW_S_
Beginner
794 Views
Sergey Kostrov wrote:
const UserT AddUserData( UserT &a, UserT &b );
Yup, if you pass by reference it works. The point is that it MSVC complains if you pass by value. Thanks for your tests, though.
0 Kudos
JenniferJ
Moderator
683 Views
Hello Florian Pflug, I've sent you a private msg regarding the request for native x64 compiler for IA32 app, please respond. Or if you could submit it to Intel Premier Support, it would be great. Just let me know the ticket number. thank you. Jennifer
0 Kudos
Reply