- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Are micro-instructions non-destructive? If so, wouldn't it make sense to fuse an assignment and dependent arithmetic instruction into one?
a = b; a += c; -> a = b + c;
This would make up for x86's lack of non-destructive instructions. Of course compilers would have to be made aware that pairing these instructions is faster, but that seems to be a simple case of defining a non-destructive instruction pattern which is implicitly encoded as two legacy instructions.
So I wonder if there's any reason not to do this...
Cheers,
Nicolas
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
8B C3 mov eax, ebx03 C1 add eax, ecx
8B C303 C1 add eax, ebx, ecx
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ForFP instructions, AVX already provides you a solution: The VEX prefix allows a non-destructive operand, for example VADDSD xmm1, xmm2, xmm3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ForFP instructions, AVX already provides you a solution: The VEX prefix allows a non-destructive operand, for example VADDSD xmm1, xmm2, xmm3.
I know. I'm specifically talking about the scalar instructions. In discussions about other architectures, people claimed that x86 is crippled by the lack of non-destructive instructions and will never be able to make up for it (without a drastic redesign or lots of extra hardware which consumes more power). But since it's already largely a RISC architecture internally anyway I wondered whether simply executing a move and arithmetic operation as one instruction would make things more efficient at a minimal cost.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In some cases there is the possibility to circumvent the problem by making a copy but modifying the original (thereby letting original and copy change roles) and relying on superscalar execution.
A related case might be that some "complex" commands such as jecxz, loop, enter (level 0), leave are so slow although their meaning is almost trivial and they are easily outperformed by a sequence of other commands. Why? Probably because of the same reasons the gluing of mov and an arithmetic command is not performed:
It seems that RISC commands are still much faster than micro coded ones and the effort to make a glued or "complex" command a new RISC command is estimated too high for the expected gain.
Let's see what the next generations will bring...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With the introduction of ANDN, BEXTR, RORX, SARX, SHLX, SHRX these new commands effectively solve our problem for some special cases, albeit with the aid of the compiler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could it be that e.g. AMD has already implemented your proposal possibly years ago?
Has anyone done any benchmarking on other processors than i7 and Atom N450?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page