Thanks Shih, but I really am

Félix_C_ · ‎04-27-2015

Hi all,

Volume 2A currently says that imul sets SF according to the sign bit of the truncated result:

SF is updated according to the most significant bit of the operand-size-truncated result in the destination. For the one operand form of the instruction, the CF and OF flags are set when significant bits are carried into the upper half of the result and cleared when the result fits exactly in the lower half of the result. For the two- and three-operand forms of the instruction, the CF and OF flags are set when the result must be truncated to fit in the destination operand size and cleared when the result fits exactly in the destination operand size. The ZF, AF, and PF flags are undefined.

So I had quite the surprise when running this on my MacBook Pro from 2010 (i7 M620)…

mov   rax, 0x9090909090909095
mov   rdx, 0x4040404040404043
imul  rax, rdx

…gave me a cleared SF flag, even though the most significant bit of rax is set.

Digging up further, I found that the documentation for imul changed in September 2014: before then, it was said that SF was left undefined after it. However, it is not said whether this change reflects exact but previously undocumented behavior of Intel CPUs, or if this is a new thing that not all Intel CPUs since forever are expected to implement.

Does anyone know anything about this?

(I asked on Stack Overflow too. Also, it's kind of ironic that Intel's markup system doesn't support x86 assembly.)

zalia64 · ‎05-05-2015

You are using a two-registers IMUL instruction. With your numbers, the result can not be contained in a 64-bit register. The Carry flag and the Overflow flags are set, and the returned value is the 64 least-significant-bits of an 128-bit result.

Of course, talking about "the Sign Bit of the lowest-64-bits of the results" is a misconception. The most significant bit of the returned value has nothing to do with the sign of the result. I guess that the SF blindly reflects the setting of the most-significant bit of the lowest 64-bit part. In any case, your complaint is misplaced.

Félix_C_ · ‎05-05-2015

Are you sure that you understood my "complaint" before calling it misplaced? I'm saying that there is a mismatch between the official documentation (that part in bold that you called a misconception) and what I observe on my computer.

Regardless, 0x9090909090909095 is a negative number, but 0x4040404040404043 isn't. Simple maths remind us that multiplying a positive number with a negative number has a negative result. In other words, the accurate mathematical result is negative, the truncated result is also negative, but the sign flag still isn't set. I'm not sure what else should be considered in setting SF.

SHIH_K_Intel · ‎05-14-2015

Your question is really about the SF update behavior of the one-operand form of "imul r64".

What you show as "imul rax, rdx" resembles what debugger will dis-assemble the one-operand form of imul to aid in a debug environment.

The SF behavior of "imul r64", calculating a 128-bit product, is actually no different from "imul r32" calculating a 64-bit product stored in EDX:EAX. And similarly for "imul r16" producing a 32-bit product in DX:AX.

One way to understand how the hardware behave is to realize the calculation of taking two input integers (actually, each is just a sequence of bits) as a product of up to twice as long of bit sequence, is separate operation from the arithmetic sign operation on a "operand".

The key to updating the SF is that this is an arithmetric sign operation on a register operand. The hardware does not know how to do arithmetic sign operation on a sequence of bits thta does not fit in a register operand.

To my understanding, this a legacy that started some ~30+ years ago when imul had the "imul r16" to generate 32-bit output in a 16-bit hardware. Subsequently, this legacy is extended when 32-bit hardware provide the capability to generate 64-bit product, similarly with the introduction of 64-bit hardware.

You may ask why did the original designers choose to update SF based on the lower half of the two destination registers? Think about how high-level language with signed integer types of various widths. On a 32-bit machine, simple arithmetic operators of int64 will have to be generated into long sequences of instructions, but if dynamic range of the result of arithmetic operator of multiplication is small enough, it can be accelerated by using imul. In other words, if the choice was made to update SF using the high half of the two implicit register operands, it would have deprived lower-dynamic-range, hw-accelerated performance of integer arimethics using one operand form of imul.

Félix_C_ · ‎05-14-2015

Thanks Shih, but I really am talking about the two-operands form of imul. Imul has a one-operand form (opcodes f5 and f7), a two-operand form (opcode 0f af) and even a 3-operand form (opcodes 6b and 69). My program uses the second form (48 0f af c1, 48 being the REX.W prefix).

I know that the one-operand form places its result in D and A and has a result twice the size of the operands, and the documentation correctly specifies that the sign flag is set from the value of the most significant bit of the lower register (though I did not realize that until you told me). However, it specifies that for the 2- and 3-operand form, the sign flag is set according to the most significant bit of the final value, which is what concerns me.

For reference, the Volume 2A PDF can be found here. The pages of interest are 3-397 and 3-398 (463-464 if you use your viewer's "Go To Page" functionality).

SHIH_K_Intel · ‎05-27-2015

What I referred to as one operand form of "imul reg" is encoded with one byte opcode F7 or F5. We refer to it as one operand because implicit source operand RAX/EAX/AX/AL does not need a encoding specifier. So one-operand form uses one implicit register and one explicitly encoded source operand which can be a register or memory. The output is stored in two implicit destinations, but the SF is updated according to the low half result in RAX/EAX/AX/AL.

Félix_C_ · ‎05-27-2015

Yes. We're in the clear then, as I use the 2-operands 0FAF form. In this context, the sign flag should be set according to the truncated 64-bit result, shouldn't it?

Behavior of IMUL regarding SF