I was wondering why the common floating SSE instructions (e.g. movps/ss, addps/ss, mulps/ss) don't have variants that take immediate operands?
As it currently is, any immediate constant value must be loaded from memory (slow, even if cached) or, if used inside a loop,
can be copied to a register before the loop, but consumes one register.
Is this tough for hardware implementation or there is another problem?
Considering the fact that the immediate versions of the integer instructions are more optimal to use than than loading from memory,
the same would probably be true for hypothetical immediate-operand sse instructions as well?
Of course I am talking about scalar operands (vector ones are too large to be immediate)
Also probably only 32-bit (single precision floats, not doubles). But that would still be much better than nothing.
Even for a vectored code it is very common to use constant factors that are the same for all channels,
or the same additive (bias) const value to be added to all channels. So I think it would be very useful for the vectored instructions too.
(They may take 32-bit immediate operand, which is then replicated to all channels.)