- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have found a white paper about Intel Compilers compatibility
with GNU compilers and Linux kernel compilation using it.
And would appreciate if Intel developers could comment one issue regarding similar
compilation research.
I try to build a low-level project which doesn't and actually can't use
FPU/MMX/SSE instructions (exactly like Linux kernel) using Intel Compiler
10/11 for Linux, but the resulting code contains MMX/SSE instruction
sequences which were created by ICL itself.
Specifying options "-mia32", "-mcpu=pentium" or "-m32" doesn't help.
Can someone please tell me how this problem were solved for Linux kernel compilation?
Which option was used or any tricks you did to avoid MMX/SSE code?
With best regards,
Andrey Mirkin
I have found a white paper about Intel Compilers compatibility
with GNU compilers and Linux kernel compilation using it.
And would appreciate if Intel developers could comment one issue regarding similar
compilation research.
I try to build a low-level project which doesn't and actually can't use
FPU/MMX/SSE instructions (exactly like Linux kernel) using Intel Compiler
10/11 for Linux, but the resulting code contains MMX/SSE instruction
sequences which were created by ICL itself.
Specifying options "-mia32", "-mcpu=pentium" or "-m32" doesn't help.
Can someone please tell me how this problem were solved for Linux kernel compilation?
Which option was used or any tricks you did to avoid MMX/SSE code?
With best regards,
Andrey Mirkin
Link Copied
11 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wasn't aware that avoidance of SSE was a goal of projects which use icc to built linux kernel, or that anyone considered instruction set choices "tricks."
I can't guess how you would get MMX code generation.
ICL is the name of the Windows version of the compiler.
-mia32 should be recognized by icc 11.0 32-bit only, which would have defaulted to -msse2. icc 10.1 32-bit would default to the equivalent of -mia32.
10.1 32-bit had a partly supported, partly deprecated, undocumented, unreliable backward compatibility option to generate sse but not sse2; it's not clear if that's the code generation you are discussing. 11.0 takes that option as a synonym for -mia32.
You may be able to "trick" the 64-bit icc into generating x87 code by -O0, or by long double data types, or partially by -mp -vec-.
-m32 is a gcc option, not supported by icc, which causes a switch to the 32-bit compiler, but doesn't itself choose instruction set.
As you can see, no one is guessing well the details of your concern. Perhaps you could show an example.
I can't guess how you would get MMX code generation.
ICL is the name of the Windows version of the compiler.
-mia32 should be recognized by icc 11.0 32-bit only, which would have defaulted to -msse2. icc 10.1 32-bit would default to the equivalent of -mia32.
10.1 32-bit had a partly supported, partly deprecated, undocumented, unreliable backward compatibility option to generate sse but not sse2; it's not clear if that's the code generation you are discussing. 11.0 takes that option as a synonym for -mia32.
You may be able to "trick" the 64-bit icc into generating x87 code by -O0, or by long double data types, or partially by -mp -vec-.
-m32 is a gcc option, not supported by icc, which causes a switch to the 32-bit compiler, but doesn't itself choose instruction set.
As you can see, no one is guessing well the details of your concern. Perhaps you could show an example.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim,
I think the problem is some of the data movement and/or data initialization (with optimizations) is using the xmm registers. Not for floating point purposes. Because the standard application level xmm register usage rules declare a subset of xmm registers as free, the compiler assumes it is building for an application and not a kernel routine, and as a result stomps on a register.
The old Borland Turbo C had an interrupt keyword that you could place on a function (return of void) that would disable the rule of "free registers", but it had other limitations
The user is asking for and option or #pragmaor __declspec or something that declares these xxx registers are not free so preserve or don't use.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The original MMX implementations used the same register space as was allocated to x87 stack, with special macros required for switching between x87 and MMX modes. As that was so long ago, on CPU architectures which no longer get much testing, if the kernel source is using MMX explicitly, I could imagine bugs being exposed. If that's the question, current icc is not the ideal compiler for supporting CPUs prior to P4. I do remember when P-II was the greatest and latest, and almost no one attempted to use Intel compilers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tim, jim:
I think that the problem is that we cannot make ICC to output only x86 code without touching the SIMD registers. Part of the issue is also the inability to use memset() from the CRT without bringing in the CPU dispatcher and whole printf() with it.
That kind of limits what ICC can be used for and I believe that we should have a greater control over the code output.
I think that the problem is that we cannot make ICC to output only x86 code without touching the SIMD registers. Part of the issue is also the inability to use memset() from the CRT without bringing in the CPU dispatcher and whole printf() with it.
That kind of limits what ICC can be used for and I believe that we should have a greater control over the code output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Igor Levicki
tim, jim:
I think that the problem is that we cannot make ICC to output only x86 code without touching the SIMD registers. Part of the issue is also the inability to use memset() from the CRT without bringing in the CPU dispatcher and whole printf() with it.
That kind of limits what ICC can be used for and I believe that we should have a greater control over the code output.
I think that the problem is that we cannot make ICC to output only x86 code without touching the SIMD registers. Part of the issue is also the inability to use memset() from the CRT without bringing in the CPU dispatcher and whole printf() with it.
That kind of limits what ICC can be used for and I believe that we should have a greater control over the code output.
I think the number of customers who would prefer run-time library paths to be limited to architectures specified in the compile switch (avoiding CPU dispatch) may be underestimated. Such ideas have been raised but haven't gone far. It's hard to foresee all the implications in run-time library maintenance, but the QA implications of not knowing which run-time will be in use also are serious.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
In my limited experience with it, -fno-builtin prevents the substitution of _intel_fast_mem... run-time functions. As we've seen on this forum, it is possible to abuse that option.
I think the number of customers who would prefer run-time library paths to be limited to architectures specified in the compile switch (avoiding CPU dispatch) may be underestimated. Such ideas have been raised but haven't gone far. It's hard to foresee all the implications in run-time library maintenance, but the QA implications of not knowing which run-time will be in use also are serious.
I think the number of customers who would prefer run-time library paths to be limited to architectures specified in the compile switch (avoiding CPU dispatch) may be underestimated. Such ideas have been raised but haven't gone far. It's hard to foresee all the implications in run-time library maintenance, but the QA implications of not knowing which run-time will be in use also are serious.
Yes, but:
1. There is no such thing as -fno-builtin on Windows. Why we can't have that switch?
2. If I specify that I want the code for Penryn CPU (-QxS) why the compiler couldn't pull in just the dispatched functions for the Penryn code path instead of the dispatcher and the CPU checking code? Why I cannot override its decision if I know the target system is going to have the right CPU?
3. Why I can't force compiler to fallback to X86/FPU code generation if I want to avoid SIMD state and register usage issues in embedded projects?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Igor Levicki
2. If I specify that I want the code for Penryn CPU (-QxS) why the compiler couldn't pull in just the dispatched functions for the Penryn code path instead of the dispatcher and the CPU checking code? Why I cannot override its decision if I know the target system is going to have the right CPU?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Igor Levicki
tim, jim:
I think that the problem is that we cannot make ICC to output only x86 code without touching the SIMD registers. Part of the issue is also the inability to use memset() from the CRT without bringing in the CPU dispatcher and whole printf() with it.
That kind of limits what ICC can be used for and I believe that we should have a greater control over the code output.
I think that the problem is that we cannot make ICC to output only x86 code without touching the SIMD registers. Part of the issue is also the inability to use memset() from the CRT without bringing in the CPU dispatcher and whole printf() with it.
That kind of limits what ICC can be used for and I believe that we should have a greater control over the code output.
Igor,
You are right, my question was how to make ICC to output only x86 code without SSE/MMX instructions and without touching xmm registers.
As I understand from discussion there is no such options for ICC 10.1 and 11.0. Correct me if I'm wrong.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Igor Levicki
tim, jim:
I think that the problem is that we cannot make ICC to output only x86 code without touching the SIMD registers. Part of the issue is also the inability to use memset() from the CRT without bringing in the CPU dispatcher and whole printf() with it.
That kind of limits what ICC can be used for and I believe that we should have a greater control over the code output.
I think that the problem is that we cannot make ICC to output only x86 code without touching the SIMD registers. Part of the issue is also the inability to use memset() from the CRT without bringing in the CPU dispatcher and whole printf() with it.
That kind of limits what ICC can be used for and I believe that we should have a greater control over the code output.
Igor,
That was the point I was trying to make. The programmer should have the capability to "exclude" processor features, processor detection code and other fluff routines such as printf.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Igor Levicki
Yes, but:
1. There is no such thing as -fno-builtin on Windows. Why we can't have that switch?
2. If I specify that I want the code for Penryn CPU (-QxS) why the compiler couldn't pull in just the dispatched functions for the Penryn code path instead of the dispatcher and the CPU checking code? Why I cannot override its decision if I know the target system is going to have the right CPU?
3. Why I can't force compiler to fallback to X86/FPU code generation if I want to avoid SIMD state and register usage issues in embedded projects?
The equivalent option on Windows for -fno-builtin is /Oi-. It inhibits the compiler from doing some transformations, liketransformingmemcpy() to _intel_fast_memcpy().
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Feilong H (Intel)
The equivalent option on Windows for -fno-builtin is /Oi-. It inhibits the compiler from doing some transformations, liketransformingmemcpy() to _intel_fast_memcpy().
I am aware of that. What I am still not sure of is in which situations exactly CPU dispatcher code and printf get included? Is there some list of CRT library functions that rely on a CPU check?
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page