Are micro-instructions non-destructive? If so, wouldn't it make sense to fuse an assignment and dependent arithmetic instruction into one?
a = b; a += c; -> a = b + c;
This would make up for x86's lack of non-destructive instructions. Of course compilers would have to be made aware that pairing these instructions is faster, but that seems to be a simple case of defining a non-destructive instruction pattern which is implicitly encoded as two legacy instructions.
So I wonder if there's any reason not to do this...
8B C3 mov eax, ebx03 C1 add eax, ecx
8B C303 C1 add eax, ebx, ecx
I know. I'm specifically talking about the scalar instructions. In discussions about other architectures, people claimed that x86 is crippled by the lack of non-destructive instructions and will never be able to make up for it (without a drastic redesign or lots of extra hardware which consumes more power). But since it's already largely a RISC architecture internally anyway I wondered whether simply executing a move and arithmetic operation as one instruction would make things more efficient at a minimal cost.
In some cases there is the possibility to circumvent the problem by making a copy but modifying the original (thereby letting original and copy change roles) and relying on superscalar execution.
A related case might be that some "complex" commands such as jecxz, loop, enter (level 0), leave are so slow although their meaning is almost trivial and they are easily outperformed by a sequence of other commands. Why? Probably because of the same reasons the gluing of mov and an arithmetic command is not performed:
It seems that RISC commands are still much faster than micro coded ones and the effort to make a glued or "complex" command a new RISC command is estimated too high for the expected gain.
Let's see what the next generations will bring...
With the introduction of ANDN, BEXTR, RORX, SARX, SHLX, SHRX these new commands effectively solve our problem for some special cases, albeit with the aid of the compiler.
Could it be that e.g. AMD has already implemented your proposal possibly years ago?
Has anyone done any benchmarking on other processors than i7 and Atom N450?