- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello. I am want to propose two features from MSVC compiler that not available in ICC. All tests are made on windows x86.
1) Combining several small mov's to the one. Example:
struct struct_t { char a, b, c, d; }; void __declspec(noinline) test(struct_t& s) { s.a = 'a'; s.b = 'b'; s.c = 'c'; s.d = 'd'; }
Code by the current ICC with -Ox:
mov BYTE PTR [eax], 97 mov BYTE PTR [1+eax], 98 mov BYTE PTR [2+eax], 99 mov BYTE PTR [3+eax], 100 ret
This four byte mov's can be combined to the single dword mov like it does MSVC:
mov DWORD PTR [ecx], 1684234849 ; 64636261H ret
2) Eliminate useless copying from volatile memory to registers. I think it's correct, and MSVC does this optimization. Example:
#include <stdio.h> bool isInt(int) { return true; } bool isInt(short) { return false; } void __declspec(noinline) test() { volatile int a = 5; volatile short b = 2; printf("int = %i, short = %i\n", isInt(a), isInt(b)); }
Result:
sub esp, 8 mov eax, 2 mov DWORD PTR [esp], 5 mov WORD PTR [4+esp], ax mov edx, DWORD PTR [esp] ; <- unnecessary copying movzx ecx, WORD PTR [4+esp] ; <- unnecessary copying ; here was a copying from edx and ecx to non-volatile memory, but it was eliminated as a deadcode push 0 push 1 push OFFSET "int = %i, float = %i\n" call DWORD PTR [__imp__printf] add esp, 12 add esp, 8 ret
And here we can also see a two uncombined add's before return.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
volatile requires memory to be reference. Imagine if the address were an I/O port, e.g. mouse register. You wouldn't see the mouse move (if you kept reading the registered copy).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We've entered suggestion (1) in our internal bugs tracking database as DPD200415370. There are potential alignment & store forwarding issues to deal with, but currently, I believe we lack the fundamental capability to do this, even when we know the resulting 32-bit store will be aligned and when targeting architectures where subsequent small loads will all forward.
As far as (2) we asked our language expert and he said
(!) Is the compiler required to load volatiles a and b in the program below due to the calls to isInt(a) & isInt(b)
Yep. They don't strictly have to be moved to any particular registers, but they have to be loaded
(2) Would the answer be different if the arguments to isInt were named?
Nope. The references to a and b aren't inside either of the functions named isInt.
Perhaps we're not using the same switches or compiler version but we see that Microsoft loads them also when we look at the assembly.
thanks for the suggestions.
Judy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One of my favorite low-level optimizations comes from gcc. When I do low-level timer or performance counter work, my code is full of performance counter reads that the Intel compiler turns into code like:
RDPMC # read performance counters, getting low 32 bits in %eax and high 16 bits in %edx
MOVL %edx,%edx # I guess this is to ensure that the high-order bits are clear? may be "executed" in the renamer?
MOVL %eax, %eax # I guess this is to ensure that the high-order bits are clear? may be "executed" in the renamer?
SHLQ $32, %rdx # shift the high order bits
ORQ %rdx,%rax # combine the upper and lower bits into a single 64-bit register
MOVQ %rax,(%rsp) # store the 64-bit value
I had to laugh out loud when I saw what gcc did with the code. Basically it replaced the shift and OR with a simple 32-bit store. Something like:
RDPMC
MOV %eax,(%rsp)
MOV %edx,4(%rsp)
This may not be any faster than the Intel-generated code -- but it was funny to have a compiler call me an idiot.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page